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Preface 


Important though the general concepts and propositions may be with 
which the modern and industrious passion for axiomatizing and generalizing 
has presented us, in algebra perhaps more than anywhere else, nevertheless | 

am convinced that the special problems in all their complexity constitute the 
stock and core of mathematics, and that to master their difficulties requires 
on the whole the harder labor. 


—Herman Weyl 


This book began many years ago in the form of supplementary notes for my algebra classes. 
I wanted to discuss some concrete topics such as symmetry, linear groups, and quadratic 
number fields in more detail than the text provided, and to shift the emphasis in group theory 
from permutation groups to matrix groups. Lattices, another recurring theme, appeared 
spontaneously. 

My hope was that the concrete material would interest the students and that it would 
make the abstractions more understandable - in short, that they could get farther by learning 
both at the same time. This worked pretty well. It took me quite a while to decide what to 
include, but I gradually handed out more notes and eventually began teaching from them 
without another text. Though this produced a book that is different from most others, the 
problems I encountered while fitting the parts together caused me many headaches. J can’t 
recommend the method. 

There is more emphasis on special topics here than in most algebra books. They tended 
to expand when the sections were rewritten, because I noticed over the years that, in contrast 
to abstract concepts, with concrete mathematics students often prefer more to less. As a 
result, the topics mentioned above have become major parts of the book. 

In writing the book, I tried to follow these principles: 


1. The basic examples should precede the abstract definitions. 
2. Technical points should be presented only if they are used elsewhere in the book. 
3. All topics should be important for the average mathematician. 


Although these principles may sound like motherhood and the flag, I found it useful to have 
them stated explicitly. They are, of course, violated here and there. 

The chapters are organized in the order in which I usually teach a course, with linear 
algebra, group theory, and geometry making up the first semester. Rings are first introduced 
in Chapter 11, though that chapter is logically independent of many earlier ones. I chose 
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this arrangement to emphasize the connections of algebra with geometry at the start, and 
because, overall, the material in the first chapters is the most important for people in other 
fields. The first half of the book doesn’t emphasize arithmetic, but this is made up for in the 
later chapters. 


About This Second Edition 


The text has been rewritten extensively, incorporating suggestions by many people as well as 
the experience of teaching from it for 20 years. I have distributed revised sections to my class 
all along, and for the past two years the preliminary versions have been used as texts. As a 
result, I’ve received many valuable suggestions from the students. The overall organization 
of the book remains unchanged, though I did split two chapters that seemed long. 

There are a few new items. None are lengthy, and they are balanced by cuts made 
elsewhere. Some of the new items are an early presentation of Jordan form (Chapter 4), a 
short section on continuity arguments (Chapter 5), a proof that the alternating groups are 
simple (Chapter 7), short discussions of spheres (Chapter 9), product rings (Chapter 11), 
computer methods for factoring polynomials and Cauchy’s Theorem bounding the roots of a 
polynomial (Chapter 12), and a proof of the Splitting Theorem based on symmetric functions 
(Chapter 16). I’ve also added a number of nice exercises. But the book is long enough, so 
I’ve tried to resist the temptation to add material. 


NOTES FOR THE TEACHER 


This book is designed to allow you to choose among the topics. Don’t try to cover the book, 
but do include some of the interesting special topics such as symmetry of plane figures, the 
geometry of SU2, or the arithmetic of imaginary quadratic number fields. If you don’t want 
to discuss such things in your course, then this is not the book for you. 

There are relatively few prerequisites. Students should be familiar with calculus, the 
basic properties of the complex numbers, and mathematical induction. An acquaintance with 
proofs is obviously useful. The concepts from topology that are used in Chapter 9, Linear 
Groups, should not be regarded as prerequisites. 

I recommend that you pay attention to concrete examples, especially throughout the 
early chapters. This is very important for the students who come to the course without a 
clear idea of what constitutes a proof. 

One could spend an entire semester on the first five chapters, but since the real fun 
starts with symmetry in Chapter 6, that would defeat the purpose of the book. Try to get 
to Chapter 6 as soon as possible, so that it can be done at a leisurely pace. In spite of its 
immediate appeal, symmetry isn’t an easy topic. It is easy to be carried away and leave the 
students behind. 

These days most of the students in my classes are familiar with matrix operations and 
modular arithmetic when they arrive. I’ve not been discussing the first chapter on matrices 
in class, though I do assign problems from that chapter. Here are some suggestions for 
Chapter 2, Groups. 


1. Treat the abstract material with a light touch. You can have another go at it in Chapters 6 
and 7. 
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2. For examples, concentrate on matrix groups. Examples from symmetry are best deferred 
to Chapter 6. 

3. Don’t spend much time on arithmetic; its natural place in this book is in Chapters 12 
and 13. 

4. De-emphasize the quotient group construction. 


Quotient groups present a pedagogical problem. While their construction is concep- 
tually difficult, the quotient is readily presented as the image of a homomorphism in most 
elementary examples, and then it does not require an abstract definition. Modular arithmetic 
is about the only convincing example for which this is not the case. And since the integers 
modulo n form a ring, modular arithmetic isn’t the ideal motivating example for quotients 
of groups. The first serious use of quotient groups comes when generators and relations are 
discussed in Chapter 7. I deferred the treatment of quotients to that point in early drafts 
of the book, but, fearing the outrage of the algebra community, I eventually moved it to 
Chapter 2. If you don’t plan to discuss generators and relations for groups in your course, 
then you can defer an in-depth treatment of quotients to Chapter 11, Rings, where they play 
acentral role, and where modular arithmetic becomes a prime motivating example. 

In Chapter 3, Vector Spaces, I’ve tried to set up the computations with bases in such a 
way that the students won’t have trouble keeping the indices straight. Since the notation is 
used throughout the book, it may be advisable to adopt it. 

The matrix exponential that is defined in Chapter 5 is used in the description of one- 
parameter groups in Chapter 10, so if you plan to include one-parameter groups, you will 
need to discuss the matrix exponential at some point. But you must resist the temptation to 
give differential equations their due. You will be forgiven because you are teaching algebra. 

Except for its first two sections, Chapter 7, again on groups, contains optional material. 
A section on the Todd-Coxeter algorithm is included to justify the discussion of generators 
and relations, which is pretty useless without it. It is fun, too. 

There is nothing unusual in Chapter 8, on bilinear forms. I haven’t overcome the main 
pedagogical problem with this topic — that there are too many variations on the same theme, 
but have tried to keep the discussion short by concentrating on the real and complex cases. 

In the chapter on linear groups, Chapter 9, plan to spend time on the geometry of SU2. 
My students complained about that chapter every year until I expanded the section on SU2, 
after which they began asking for supplementary reading, wanting to learn more. Many of 
our students aren’t familiar with the concepts from topology when they take the course, but 
I’ve found that the problems caused by the students’ lack of familiarity can be managed. 
Indeed, this is a good place for them to get an idea of a manifold. 

I resisted including group representations, Chapter 10, for a number of years, on the 
grounds that it is too hard. But students often requested it, and I kept asking myself: If the 
chemists can teach it, why can’t we? Eventually the internal logic of the book won out and 
group representations went in. As a dividend, hermitian forms got an application. 

You may find the discussion of quadratic number fields in Chapter 13 too long for a 
general algebra course. With this possibility in mind, I’ve arranged the material so that the 
end of Section 13.4, on ideal factorization, is a natural stopping point. 

It seemed to me that one should mention the most important examples of fields in a 
beginning algebra course, so I put a discussion of function fields into Chapter 15. There is 
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always the question of whether or not Galois theory should be presented in an undergraduate 
course, but as a culmination of the discussion of symmetry, it belongs here. 

Some of the harder exercises are marked with an asterisk. 

Though I’ve taught algebra for years, various aspects of this book remain experimental, 
and I would be very grateful for critical comments and suggestions from the people who use it. 
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“One, two, three, five, four...“ 
“No Daddy, it’s one, two, three, four, five.” 


‘Well if | want to say one, two, three, five, four, why can’t 1?” 
“That's not how it goes.’ 


—Carolyn Artin 


CHAPTER 1 


Matrices 


Erfflich wird alles Sasjenige efne Brope genennt, 
welches einer Beemehrung ober einer Beemindecung ftihia sft, 
oder sogu fich noch etwas Hingufeben oder Sabon wegnehmen Itift. 


—Leonhard Euler! 


Matrices play a central role in this book. They form an important part of the theory, and 
many concrete examples are based on them. Therefore it is essential to develop facility in 
matrix manipulation. Since matrices pervade mathematics, the techniques you will need are 
sure to be useful elsewhere. 


1.1 THE BASIC OPERATIONS 


Let m and n be positive integers. An m Xn matrix is a collection of mn numbers arranged 
in a rectangular array 


n columns 
a1 Ain 
(1.1.1) m TOWS : 
Qm1 °° G@mn 


2 1 =0 
1°33: 3 
a symbol such as A to denote a matrix. 

The numbers in a matrix are the matrix entries. They may be denoted by a;;, where i 
and j are indices (integers) with 1 <i < m and 1 < j <n, the index i is the row index, and 
J is the column index. So a;; is the entry that appears in the ith row and jth column of the 
matrix: 


For example, is a2 X3 matrix (two rows and three columns). We usually introduce 


Dyes Qj 


! This is the opening sentence of Euler’s book Algebra, which was published in St. Petersburg in 1770. 


2 Chapter 1 Matrices 


In the above example, a, = 2, aj3 = 0, and a73 = 5. We sometimes denote the matrix 
whose entries are a;; by (a;;). 

Ann Xn matrix is called a square matrix. A 1X1 matrix [a] contains a single number, 
and we do not distinguish such a matrix from its entry. 

A 1Xn matrix is an n-dimensional row vector. We drop the index i when m = 1 and 
write a row vector as 


[a1 --- Gn], oras (a1,..., an). 
Commas in such a row vector are optional. Similarly, an m X1 matrix is an 


m-dimensional column vector: 
by 
bm 
In most of this book, we won’t make a distinction between an n-dimensional column vector 


and the point of n-dimensional space with the same coordinates. In the few places where the 
distinction is useful, we will state this clearly. 


Addition of matrices is defined in the same way as vector addition. Let A = (a;;) and 
B = (b;;) be two mXn matrices. Their sum A + B is the m Xn matrix S = (s;;) defined by 


Sij = Ajj + dij. 


21 0), )1 0 3)_]/3 1 3 
13 5 4 -3 1} [5 0 6]° 


Addition is defined only when the matrices to be added have the same shape — when they 


Thus 


are m Xn matrices with the same m and n. 
Scalar multiplication of a matrix by a number is also defined as with vectors. The result 
of multiplying an mn matrix A by a number c is another m Xn matrix B = (b;;), where 


bij = Ccaj; for all 1, J. Thus 
5f2 1 0]_[4 2 0 
es eee yf («a 6 6 


Numbers will also be referred to as scalars. Let’s assume for now that the scalars are real 
numbers. In later chapters other scalars will appear. Just keep in mind that, except for 
occasional reference to the geometry of real two- or three-dimensional space, everything in 
this chapter continues to hold when the scalars are complex numbers. 


The complicated operation is matrix multiplication. The first case to learn is the product 
AB of arow vector A and a column vector B, which is defined when both are the same size, 
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say m. If the entries of A and B are denoted by a; and b;, respectively, the product AB is the 
1 <1 matrix, or scalar, 


(1.1.2) ayby + angbz+---+anbm. 
Thus 
1 
[1 3 S]] -1 | =1-3+20=18. 
4 


The usefulness of this definition becomes apparent when we regard A and B as vectors that 
represent indexed quantities. For example, consider a candy bar containing m ingredients. 
Let a; denote the number of grams of (ingredient); per bar, and let b; denote the cost of 
(ingredient); per gram. The matrix product AB computes the cost per bar: 


(grams/bar) - (cost/gram) = (cost/bar). 


In general, the product of two matrices A = (a;;) and B = (b;;) is defined when the 
number of columns of A is equal to the number of rows of B. If A is an € Xm matrix and B is 
an m Xn matrix, then the product will be an € Xn matrix. Symbolically, 


(€Xm)-(mXn) = (Xn). 


The entries of the product matrix are computed by multiplying all rows of A by all columns 
of B, using the rule (1.1.2). If we denote the product matrix AB by P = (p;;), then 


(1.1.3) Dij = Ab + ai2b2j + +++ + Aimbm;- 


This is the product of the ith row of A and the jth column of B. 


by 


For example, 


(1.1.4) Ei Jfa]-[4] 
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This definition of matrix multiplication has turned out to provide a very convenient 
computational tool. Going back to our candy bar example, suppose that there are & candy 
bars. We may form the @ Xm matrix A whose ith row measures the ingredients of (bar),. If 
the cost is to be computed each year for years, we may form the m Xn matrix B whose jth 
column measures the cost of the ingredients in (year) ;, Again, the matrix product AB = P 
computes cost per. bar: pj; ; = cost of (bar); in (year) ;. 


One reason for matrix notation is to provide a shorthand way of writing linear 
equations. The system of equations 


QyxX1 +--+: + AtnXn = d 
ayX1 + +++ + AynXn = 2 
AamiX1 + +++ + AmnXn = bm 


can be written in matrix notation as 
(1.1.5) AX=B 


where A denotes the matrix of coefficients, X¥ and B are column vectors, and AX is the 
matrix product: 


We may refer to an equation of this form simply as an “‘equation”’ or as a “‘system.” 


The matrix equation 
7 as fe ee ae 
es oe Pe me 
x3 


represents the following system of two equations in three unknowns: 


2x1 + X2 = 1 
Xy +3x2 + 5x3 = 18. 


Equation (1.1.4) exhibits one solution, x; = 1, x2 =-1, x3 = 4. There are others. 
The sum (1.1.3) that defines the product matrix can also be written in summation or 
“sigma”? notation as 


m 
(1.1.6) Pig= ). dinbsy =) Bisby): 


p=l 
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Each of these expressions for p;; is a shorthand notation for the sum. The large sigma 
indicates that the terms with the indices v = 1, ..., are to be added up. The right-hand 
notation indicates that one should add the terms with all possible indices v. It is assumed 
that the reader will understand that, if A is an €Xm matrix and B is an m Xn matrix, the 
indices should run from 1 to m. We’ve used the greek letter ‘‘nu,” an uncommon symbol 
elsewhere, to distinguish the index of summation clearly. 

Our two most important notations for handling sets of numbers are the summation 
notation, as used above, and matrix notation. The summation notation is the more versatile 
of the two, but because matrices are more compact, we use them whenever possible. One 
of our tasks in later chapters will be to translate complicated mathematical structures into 
matrix notation in order to be able to work with them conveniently. 

Various identities are satisfied by the matrix operations. The distributive laws 


(1.1.7) A(B +B’) =AB+AB’, and (A+A‘/)B=AB+A'B 

and the associative law 

(1.1.8) (AB)C = A(BC) 

are among them. These laws hold whenever the matrices involved have suitable sizes, so 
that the operations are defined. For the associative law, the sizes should be A = £Xm, 
B=mxXn, and C = nx p, for some £, m, n, p. Since the two products (1.1.8) are equal, 


parentheses are not necessary, and we will denote the triple product by ABC. It is an £ Xx p 
matrix. For example, the two ways of computing the triple product 


asc=[!]uno ult 1 


0 1 
are 
2 0 
_{|1 01 _{2 1 _ |i a [a2~ Al 
caBye=|) 0 | ; ; -[% 5 and acc) = |} | 2 ie|; aI: 
Scalar multiplication is compatible with matrix multiplication in the obvious sense: 
(1.1.9) c(AB) = (cA)B = A(cB). 


The proofs of these identities are straightforward and not very interesting. 
However, the commutative law does not hold for matrix multiplication, that is, 


(1.1.10) AB#BA, usually. 
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Even when both matrices are square, the two products tend to be different. For instance, 


fo olli o]=[o of Ei allo ol Li 1] 
0 O;]1 1 0 0O}’ 1 1)[0 0 1 1]° 
If it happens that AB = BA, the two matrices are said to commute. 

Since matrix multiplication isn’t commutative, we must be careful when working with 
matrix equations. We can multiply both sides of an equation B = C on the left by a 
matrix A, to conclude that AB = AC, provided that the products are defined. Similarly, 
if the products are defined, we can conclude that BA = CA. We cannot derive AB = CA 
from B= C. 

A matrix all of whose entries are 0 is called a zero matrix, and if there is no danger of 
confusion, it will be denoted simply by 0. 

The entries a;; of a matrix A are its diagonal entries. A matrix A is a diagonal matrix 
if its only nonzero entries are diagonal entries. (The word nonzero simply means ‘“‘different 
from zero.” It is ugly, but so convenient that we will use it frequently.) 

The diagonal n Xn matrix all of whose diagonal entries are equal to 1 is called then Xn 
identity matrix, and is denoted by J,. It behaves like the number 1 in multiplication: If A is 
an m Xn matrix, then 


(1.1.11) Aln =A and ImA=A. 


We usually omit the subscript and write J for In. 
Here are some shorthand ways of depicting the identity matrix: 


0 1 1 
We often indicate that a whole region in a matrix consists of zeros by leaving it blank or by 


putting in a single 0. 
We use « to indicate an arbitrary undetermined entry of a matrix. Thus 


ko ose Ok 


* 


may denote a square matrix A whose entries below the diagonal are 0, the other entries 
being undetermined. Such a matrix is called upper triangular. The matrices that appear in 
(1.1.14) below are upper triangular. 


Let A be a (square) nm Xn matrix. If there is a matrix B such that 


(1.1.12) AB=I, and BA=In, 
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then B is called an inverse of A and is denoted by A™!: 
(1.1.13) AT'A=I=AA), 
A matrix A that has an inverse is called an invertible matrix. 


For example, the matrix A = 


is invertible. Its inverse is A7! = E ak as 


2 1 
5:3 
can be seen by computing the products AA and A7!A. Two more examples: 


a PTL) = P-L 


We will see later that a square matrix A is invertible if there is a matrix B such that either 
one of the two relations AB = I, or BA = I, holds, and that B is then the inverse (see 
(1.2.20)) . But since multiplication of matrices isn’t commutative, this fact is not obvious. On 
the other hand, an inverse is unique if it exists. The next lemma shows that there can be only 
one inverse of a matrix A: 


Lemma 1.1.15 Let A be a square matrix that has a right inverse, a matrix R such that AR = I 
and also a left inverse, a matrix L such that LA = J. Then R = L. So A is invertible and R is 
its inverse. 


Proof, R= IR = (LA)R = L(AR) =LI=L. oO 
Proposition 1.1.16 Let A and B be invertible n Xn matrices. The product AB and the inverse 


A7! are invertible, (AB)~! = B-1A7! and (A“!)7! = A. If Ay,..., Am are invertible n Xn 
matrices, the product A; ---Aj, 1s invertible, and its inverse is A Az}. 


Proof, Assume that A and B are invertible. To show that the product B-'A7! = Q is the 
inverse of AB = P, we simplify the products PQ and QP, obtaining J in both cases. The 
verification of the other assertions is similar. O 


nome [NE HED «fa 


e It is worthwhile to memorize the inverse of a 2X2 matrix: 


-1 
a by. 1 fa = 
ott E | = sa=ne| AP 


The denominator ad — bc is the determinant of the matrix. If the determinant is zero, the 
matrix is not invertible. We discuss determinants in Section 1.4. 
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Though this isn’t clear from the definition of matrix multiplication, we will see that most 
square matrices are invertible, though finding the inverse explicitly is not a simple problem 
when the matrix is large. The set of all invertible n Xn matrices is called the n-dimensional 
general linear group. It will be one of our most important examples when we introduce the 
basic concept of a group in the next chapter. 

For future reference, we note the following lemma: 


Lemma 1.1.18 A square matrix that has either a row of zeros or a column of zeros is not 
invertible. 


Proof. If a row of an n Xn matrix A is zero and if B is any other n Xn matrix, then the 
corresponding row of the product AB is zero too. So AB is not the identity. Therefore A has 
no right inverse. A similar argument shows that if a column of A is zero, then A has no left 
inverse. Oo 


Block Multiplication 
Various tricks simplify matrix multiplication in favorable cases; block multiplication is one 
of them. Let M and M’ be m Xn and n xX p matrices, and let r be an integer less than n. We 
may decompose the two matrices into blocks as follows: 

Br 


where A has r columns and A’ has r rows. Then the matrix product can be computed as 


M=[A|B] and u = [5]. 


(1.1.19) MM’ = AA! + BB’. 


Notice that this formula is the same as the rule for multiplying a row vector and a column 
vector. 


We may also multiply matrices divided into four blocks. Suppose that we decompose an 
m Xn matrix M and ann X p matrix M’ into rectangular submatrices 


A\B] . TA'/B' 
M= » Me= , r|s 
C\D C|D 


where the number of columns of A and C are equal to the number of rows of A’ and B’. In 
this case the rule for block multiplication is the same as for multiplication of 2X2 matrices: 


baa A\|BLA'|B’ AA’ + BC’| AB’ + BD’ 
ube) Z Are =a - or ane por 


These rules can be verified directly from the definition of matrix multiplication. 
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Please use block multiplication to verify the equation 


Besides facilitating computations, block multiplication is a useful tool for proving facts 
about matrices by induction. 


Matrix Units 


The matrix units are the simplest nonzero matrices. The m Xn matrix unit e; j has a 1 in the 
i, j position as its only nonzero entry: 


(1.1.21) ep Sb pee dt 


We usually denote matrices by uppercase (capital) letters, but the use of a lowercase letter 
for a matrix unit is traditional. 


¢ The set of matrix units is called a basis for the space of all 7 Xn matrices, because every 
m Xn matrix A = (q;;) is a linear combination of the matrices e; ;: 


(1.1.22) A=a1161; tape t= yay ij. 
ij 


The indices i, 7 under the sigma mean that the sum is to be taken over all i = 1,..., m and 
all 7 =1,..., a”. For instance, 


3 2 1 1 
k 1|=3| }2| }“hy }+4] 1 | =3eu + 2er2+ len + 4en. 


The product of an m Xn matrix unit e; ; and ann X p matrix unit e;¢ is given by the formulas 
(1.1.23) Cij eje = Cie and ej); exe = Oif j#k 


¢ The column vector e;, which has a single nonzero entry 1 in the position i, is analogous 
to a matrix unit, and the set {e;,..., @,} of these vectors forms what is called the standard 
basis of the n-dimensional space R” (see Chapter 3, (3.4.15)). If X is a column vector with 
entries (x1,..., Xn), then 


(1.1.24) X = xyey te: +Xnen = > wei. 
i 
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The formulas for multiplying matrix units and standard basis vectors are 


(1.1.25) ei; ej = i, and eijek = O if j#k. 


1.2 ROW REDUCTION 


Left multiplication byan Xn matrix A on nX p matrices, say 
(1.2.1) AX =Y, 


can be computed by operating on the rows of X. If we let X; and Y; denote the ith rows of 
X and Y, respectively, then in vector notation, 


(1.2.2) Yj = ajyyXy +--+ +4inXn, 


— X;,— = 7 


For instance, the bottom row of the product 


[2 afl 3 of=[i 5 2] 


canbecomputedas -2{1 2 1)+3[1 3 O]=[1 5 -2]. 

Left multiplication by an invertible matrix is called a row operation. We discuss these 
row operations next. Some square matrices called elementary matrices are used. There are 
three types of elementary 2X2 matrices: 


(1.23) OF {| ot E HF calf He ci ¢ i} orf AP 


where a can be any scalar and c can be any nonzero scalar. 

There are also three types of elementary n Xn matrices. They are obtained by splicing 
the elementary 2 X2 matrices symmetrically into an identity matrix. They are shown below 
with a 5X5 matrix to save space, but the size is supposed to be arbitrary. 
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(1.2.4) 
Type (i): 
J J i 
1 1 
I 1 a J L 
1 or i (J). 
] 1 i a 1 
1 1 


One nonzero off-diagonal entry is added to the identity matrix. 


s i J 
Type (ii): 1 
i 0 1 
1 
J 1 0 


The ith and jth diagonal entries of the identity matrix are replaced by zero, and 1’s are 
added in the (i, 7) and (/, i) positions. 


Type (iii): i 


i c (c#0). 
1 


One diagonal entry of the identity matrix is replaced by a nonzero scalar c. 


* The elementary matrices E operate on a matrix X this way: To get the matrix EX, you 
must: 


(1.2.5) Type(i): with ainthei, 7 position, ‘add a-(row j) of X to (row i),” 
Type(ii): “interchange (row i) and (row /j) of X,” 
Type(iii): “multiply (row i) of X by a nonzero scalar c.”” 


These are the elementary row operations. Please verify the rules. 


Lemma 1.2.6 Elementary matrices are invertible, and their inverses are also elementary 
matrices. 


Preof. The inverse of an elementary matrix is the matrix corresponding to the inverse row 
operation: “‘subtract a:(row j) from (row /),” “‘interchange (row i) and (row j)” again, or 
-1 39 


“multiply (row i) by c 


12. Chapter 1 Matrices 


We now perform elementary row operations (1.2.5) on a matrix M, with the aim of 
ending up with a simpler matrix: 
sequence of operations __, 
> >: OM. 
Since each elementary operation is obtained by multiplying by an elementary matrix, we 


can express the result of a sequence of such operations as multiplication by a sequence 
F,,..., Ex of elementary matrices: 


(1.2.7) M = Ex:--- E,E\M. 
This procedure to simplify a matrix is called row reduction. 


As an example, we use elementary operations to simplify a matrix by clearing out as 
many entries as possible, working from the left. 


ii2 4 5 a Oey i 
(1.2.8) M=/1126 10/+>>/0005 5|3 
1 2 Soe 7 0 130-2 
i ae) 3S 1.30: as 0.3 10 -1 03 
013 12/s>/01 312/s/01 301;/=m 
meee esd Perel 


The matrix M’ cannot be simplified further by row operations. 


Here is the way that row reduction is used to solve systems of linear equations. 
Suppose we are given a system of m equations in n unknowns, say AX = B, where A 
is an m Xn matrix, B is a given column vector, and X is an unknown column vector. To 
solve this system, we form the mX(n + 1) block matrix, sometimes called the augmented 
matrix 


ay + Ain | by 
(1.2.9) M=[A|B)=]| : an I 
Gm1 *': Qmn bn 
and we perform row operations to simplify M. Note that EM = [EA|EB]. Let 
M' a [A’|B’] 
be the result of a sequence of row operations. The key observation is this: 


Proposition 1.2.10 The systems A’X = B’ and AX = B have the same solutions. 
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Proof. Since M’ is obtained by a sequence of elementary row operations, there are elemen- 
tary matrices £,,..., Ex such that, with P = F,--- EF), 


M = Ey---E;M = PM. 


The matrix P is invertible, and M’ = [A’|B’] = [PA|PB]. If X is a solution of the original 
equation AX = B, we multiply by P on the left: PAX = PB, which is to say, A’X = B’. 
So X also solves the new equation. Conversely, if A’X = B’, then P-''A’X = P™'B’, that is, 
AX = B, Oo 


For example, consider the system 
Xy+ X2+2x34+ X= 5 
(1.2.11) Xy+ X2+2x3+ 6x4 = 10 
Xy+2x2+5x34+2x4= 7. 


Its augmented matrix is the matrix whose row reduction is shown above. The system of 
equations is equivalent to the one defined by the end result M’ of the reduction: 


xy — X3 =3 
X2 + 3x3 =1 
x4=1 


We can read off the solutions of this system easily: If we choose x3 = c arbitrarily, we can 
solve for x;, X2, and x4. The general solution of (1.2.11) can be written in the form 


X3=C, X1=34+C, X%=1-3c, x4=1, 


where c is arbitrary. 

We now go back to row reduction of an arbitrary matrix. It is not hard to see that, by 
a sequence of row operations, any matrix M can be reduced to what is called a row echelon 
matrix. The end result of our reduction of (1.2.8) is an example. Here is the definition: A 
row echelon matrix is a matrix that has these properties: 


(1.2.12) 


(a) If (row i) of M is zero, then (row j) is zero for all j > i. 

(b) If (row i) isn’t zero, its first nonzero entry is 1. This entry is called a pivot. 

(c) If (row (@+1)) isn’t zero, the pivot in (row (i + 1)) is to the right of the pivot in (row i). 
(d) The entries above a pivot are zero. (The entries below a pivot are zero too, by (c).) 


The pivots in the matrix M’ of (1.2.8) and in the examples below are shown in boldface. 


To make a row reduction, find the first column that contains a nonzero entry, say 
m. (If there is none, then M is zero, and is itself a row echelon matrix.) Interchange rows 
using an elementary operation of Type (ii) to move m to the top row. Normalize m to 1 
using an operation of Type (iii). This entry becomes a pivot. Clear out the entries below 
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this pivot by a sequence of operations of Type (i). The resulting matrix will have the 
block form 


, which we write as = Mj. 

0--0/O0j;* = x 

We now perform row operations to simplify the smaller matrix D,. Because the blocks to 
the left of D; are zero, these operations will have no effect on the rest of the matrix M;. By 


induction on the number of rows, we may assume that D; can be reduced to a row echelon 
matrix, say to D2, and M, 1s thereby reduced to the matrix 


1|Bi]_y 
Deo 


This matrix satisfies the first three requirements for a row echelon matrix. The entries in B, 
above the pivots of D2 can be cleared out at this time, to finish the reduction to row echelon 
form. O 


It can be shown that the row echelon matrix obtained from a matrix M by row reduction 
doesn’t depend on the particular sequence of operations used in the reduction. Since this 
point will not be important for us, we omit the proof. 

As we said before, row reduction is useful because one can solve a system of equations 
A’'X = B’ easily when A’ is in row echelon form. Another example: Suppose that 


There is no solution to A’X = B’ because the third equation is 0 = 1. On the other hand, 


1601/1 
[AjB]=|0 0 1 2 
0000/0 


has solutions. Choosing x2 = c and x4 = c’ arbitrarily, we can solve the first equation for x; 
and the second for x3. The general rule is this: 


Proposition 1.2.13 Let M’ = [A’|B’] be a block row echelon matrix, where B’ is a column 
vector. The system of equations A’X = B’ has a solution if and only if there is no pivot in the 
last column 8’. In that case, arbitrary values can be assigned to the unknown x;, provided 
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that (column i) does not contain a pivot. When these arbitrary values are assigned, the other 
unknowns are determined uniquely. QO 


Every homogeneous linear equation AX = 0 has the frivia/ solution X¥ = 0. But looking 
at the row echelon form again, we conclude that if there are more unknowns than equations 
then the homogeneous equation AX = 0 has a nontrivial solution. 


Corollary 1.2.14 Every system AX = 0 of m homogeneous equations in n unknowns, with 
m <n, has asolution X in which some x; is nonzero. 


Proof. Row reduction of the block matrix [A|0] yields a matrix [A’|0] in which A’ is in row 
echelon form. The equation A’X = 0 has the same solutions as AX = 0. The number. say r, 
of pivots of A’ is at most equal to the number m of rows, so it is less than n. The proposition 
tells us that we may assign arbitrary values to n — r variables x;. O 


We now use row reduction to characterize invertible matrices. 


Lemma 1.2.15 A square row echelon matrix M is either the identity matrix /, or else its 
bottom row is zero. 


Proof. Say that M is ann Xn rowechelon matrix. Sincetherearen columns, there are at most 
n pivots, and if there are n of them, there has to be one in each column. In this case, M = I. 
If there are fewer than 7 pivots, then some row is zero, and the bottom row is zero too. O 


Theorem 1.2.16 Let A be a square matrix. The following conditions are equivalent: 


(a) A can be reduced to the identity by a sequence of elementary row operations. 
(b) A is a product of elementary matrices. 
(c) A is invertible. 


Proof. We prove the theorem by proving the implications (a) > (b) => (c) => (a). Suppose 
that A can be reduced to the identity by row operations, say E,---£,A = JI. Multiplying 
both sides of this equation on the left by E7'---E,', we obtain A = E;!---E;,'. Since 
the inverse of an elementary matrix is elementary, (b) holds, and therefore (a) implies (b). 
Because a product of invertible matrices is invertible, (b) implies (c). Finally, we prove the 
implication (c) > (a). If A is invertible, so is the end result A’ of its row reduction. Since an 
invertible matrix cannot have a row of zeros, Lemma 1.2.15 shows that A’ is the identity. O 


Row reduction provides a method to compute the inverse of an invertible matrix A: 
We reduce A to the identity by row operations: E;,---E,A = I as above. Multiplying both 
sides of this equation on the right by A“}, 


Ey---E\l = Ey---E, = A'. 
Corollary 1.2.17 Let A be an invertible matrix. To compute its inverse, one may apply 


elementary row operations F1,..., Ex to A, reducing it to the identity matrix. The same 
sequence of operations, when applied to the identity matrix J, yields A“. O 
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Example 1.2.18 We invert the matrix A = E el . To do this, we form the 24 block 
matrix 
1 5 1 0 
[AlZ] = . 
26/]0 1 


We perform row operations to reduce A to the identity, carrying the right side along, and 
thereby end up with A~! on the right. 


[au =|} ad ee 1 [¢ 5 1 ‘]— 
De Be oll Oe 4 0 -4 | -2 1 
15/11 0 to |? 2 7 
(1.2.19) | )-| | 2 j] = [I|A7}). 
a ae Oss a O 


Proposition 1.2.20 Let A be a square matrix that has either a left inverse or a right inverse, 
a matrix B such that either BA = J or AB = J. Then A is invertible, and B is its inverse. 


Proof. Suppose that AB = J. We perform row reduction on A. Say that A’ = PA, where 
P = Ex---E, is the product of the corresponding elementary matrices, and A’ is a row 
echelon matrix. Then A’B = PAB = P. Because P is invertible, its bottom row isn’t zero. 
Then the bottom row of A’ can’t be zero either. Therefore A’ is the identity matrix (1.2.15), 
and so P is a left inverse of A. Then A has both a left inverse and a right inverse, so it is 
invertible and B is its inverse. 

If BA = J, we interchange the roles of A and B in the above reasoning. We find that B 
is invertible and that its inverse is A. Then A is invertible, and its inverse is B. O 


We come now to the main theorem about square systems of linear equations: 


Theorem 1.2.21 Square Systems. The following conditions on a square matrix A are 
equivalent: 

(a) A is invertible. 

(b) The system of equations AX = B has a unique solution for every column vector B. 

(c) The system of homogeneous equations AX = 0 has only the trivial solution X = 0. 


Proof. Given the system AX = B, we reduce the augmented matrix [A|B] to row echelon 
form [A’|B’]. The system A’X = B’ has the same solutions. If A is invertible, then A’ is the 
identity matrix, so the unique solution is ¥ = B’. This shows that (a) => (b). 

Ifann Xn matrix A is not invertible, then A’ has a row of zeros. One of the equations 
making up the system A’X = 0 is the trivial equation. So there are fewer than 7 pivots. 
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The homogeneous system A’X = 0 has a nontrivial solution (1.2.13), and so does AX = 0 
(1.2.14). This shows that if (a) fails, then (c) also fails, hence that (c) = (a). 
Finally, it is obvious that (b) => (ec). O 


We want to take particular note of the implication (c) > (b) of the theorem: 


If the homogeneous equation AX = 0 has only the trivial solution, 
then the general equation AX = B has a unique solution for every column vector B. 


This can be useful because the homogeneous system may be easier to handle than the general 
system. 


Example 1.2.22 There exists a polynomial p(t) of degree n that takes prescribed values, say 
p(aj) = bj, atn +1 distinct points tf = ao, ..., dy on the real line.’ To find this polynomial, 
one must solve a system of linear equations in the undetermined coefficients of p(t). In 
order not to overload the notation, we'll do the case n = 2, so that 


p(t) =xot xyt+ xt. 


Let do, a1, a2 and bo, bi, bz be given. The equations to be solved are obtained by substituting 
a; for t. Moving the coefficients x; to the right, they are 


xo + ajx + a; x2 = bj 


fori = 0,1, 2. This is a system AX = B of three linear equations in the three unknowns 
Xo, %1,X2, with 


1 ay a 
1 ay ay 
1 a as 


The homogeneous equation, in which B = 0, asks for a polynomial with 3 roots ao, a1, 42. A 
nonzero polynomial of degree 2 can have at most two roots, so the homogeneous equation 
has only the trivial solution. Therefore there is a unique solution for every set of prescribed 
values bo, 61, bo. 

By the way, there is a formula, the Lagrange Interpolation Formula, that exhibits the 
polynomial p(t) explicitly. Oo 


1.3. THE MATRIX TRANSPOSE 


In the discussion of the previous section, we chose to work with rows in order to apply the 
results to systems of linear equations. One may also perform column operations to simplify 
a matrix, and it is evident that similar results will be obtained. 


2Elements of a set are said to be distinct if no two of them are equal. 
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Rows and columns are interchanged by the transpose operation on matrices. The 
transpose of an mXn matrix A is the nXm matrix A! obtained by reflecting about the 
diagonal: A' = (b;;), where bj; = aj. For instance, 


1 


1 Al 3 ;] t 
= and [1 2 3) =] 2 
E 4 2 4 3 
Here are the rules for computing with the transpose: 
(1.3.1) (AB) = BYA', (A+B) =A'+B', (cA)b=cAl, (AYN =A. 


Using the first of these formulas, we can deduce facts about right multiplication from the 
corresponding facts about left multiplication. The elementary matrices (1.2.4) act by right 
multiplication AE as the following elementary column operations 


(1.3.2) ‘“‘with a in the i, 7 position, add a-(column i) to (column j)”; 
“interchange (column i) and (column j)’’; 
“multiply (column i) by a nonzero scalar c.”” 


Note that in the first of these operations, the indices i, j are the reverse of those in (1.2.5a). 


1.4 DETERMINANTS 


Every square matrix A has a number associated to it called its determinant, and denoted by 
det A. We define the determinant and derive some of its properties here. 
The determinant of a 1 x1 matrix is equal to its single entry 


(1.4.1) det [a] = a, 


and the determinant of a 2x2 matrix is given by the formula 


a b 
A. det = ad — be. 

(1.4.2) le ‘ a 

The determinant of a 2 X2 matrix A has a geometric interpretation. Left multiplication 
by A maps the space R? of real two-dimensional column vectors to itself, and the area of 
the parallelogram that forms the image of the unit square via this map is the absolute value 
of the determinant of A. The determinant is positive or negative, according to whether the 
orientation of the square is preserved or reversed by the operation. Moreover, det A = 0 if 
and only if the parallelogram degenerates to a line segment or a point, which happens when 
the columns of the matrix are proportional. 
1 ; , is shown on the following 
page. The shaded region is the image of the unit square under the map. Its area is 10. 

This geometric interpretation extends to higher dimensions. Left multiplication by a 
3x3 real matrix A maps the space R? of three-dimensional column vectors to itself, and the 
absolute value of its determinant is the volume of the image of the unit cube. 


A picture of this operation, in which the matrix is E 
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(1.4.3) . 


The set of all real n Xn matrices forms a space of dimension n? that we denote by: 
R"*”". We regard the determinant of n Xn matrices as a function from this space to the real 
numbers: 

det :R"*” > R. 
The determinant of an n Xn matrix is a function of its n? entries. There is one such function 
for each positive integer n. Unfortunately, there are many formulas for these determinants, 
and all of them are complicated when n is large. Not only are the formulas complicated, but 
it may not be easy to show directly that two of them define the same function. 

We use the following strategy: We choose one of the formulas, and take it as our’ 
definition of the determinant. In that way we are talking about a particular function: We 
show that our chosen function is the only one having certain special properties. Then, to 
show that another formula defines the same determinant function, one needs only to check: 
those properties for the other:function. This is often not too difficult. 


We use a formula that computes the determinant of an n Xn matrix in terms of certain 
(n — 1) X(n —1) determinants by a process called expansion by minors. The determinants of 
submatrices of a matrix are called minors. Expansion by minors allows us to give a recursive 
definition of the determinant. 

The word recursive means that the definition of the determinant for nn matrices 
makes use of the determinant for (n — 1) X(n — 1) matrices. Since we lrave defined the 
determinant for 1X1 matrices, we will be able to use our recursive definition:to compute. 
2X2 determinants, then knowing this, to compute 3x3 determinants, and so on. 

Let-A:be an n Xn matrix and let A;; denote the (n — 1) x (# — 1) submatrix obtained 
by.crossing out the ith row:and the jth column of A: 


(1.4.4) 
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For example, if 


1 0 3 
A= |]2 1 2], then An =[§ ‘|: 
05 1 


e Expansion by minors on the first column is the formula 
(1.4.5) det A = aj,det Ay, — az, det Az; + a3,det Az; — --- +a, det Any. 


The signs alternate, beginning with +. 


It is useful to write this expansion in summation notation: 


(1.4.6) det A = Ss +a, det Ay. 
v 


The alternating sign can be written as (-1)”+!. It will appear again. We take this formula, 
together with (1.4.1), as a recursive definition of the determinant. 

For 1X1 and 2 X2 matrices, this formula agrees with (1.4.1) and (1.4.2). The determinant 
of the 3X3 matrix A shown above is 

det A = 1- det E | ~2-det E 3] +9-ae k | 1969) 20115) 91: 
Expansions by minors on other columns and on rows, which we define in Section 1.6, are 
among the other formulas for the determinant. 

It is important to know the many special properties satisfied by determinants. We 
present some of these properties here, deferring proofs to the end of the section. Because 
we want to apply the discussion to other formulas, the properties will be stated for an 
unspecified function 6. 


Theorem 1.4.7 Uniqueness of the Determinant. There is a unique function 6 on the space of 
n Xn matrices with the properties below, namely the determinant (1.4.5). 

(i) With J denoting the identity matrix, 6) = 1. 

(ii) 6 is linear in the rows of the matrix A. 
(iii) If two adjacent rows of a matrix A are equal, then 5(A) = 0. 


The statement that 6 is linear in the rows of a matrix means this: Let A; denote the ith row 
of a matrix A. Let A, B, D be three matrices, all of whose entries are equal, except for those 
in the rows indexed by k. Suppose furthermore that Dy, = cAy + c’B, for some scalars c and 
c’. Then 5(D) = c 8(A) + c'8(B): 


(1.4.8) 8| cAj+c’B; | =c8| —A;— | +c'6| —B;— 
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This allows us to operate on one row at a time, the other rows being left fixed. For example, 


since[(O 2 3]=2[0 1 0]4+3[0 0 1], 


1 1 1 
) 2 3/=26 1 +36 1} =2-14+3-0=2. 
1 1 1 


Perhaps the most important property of the determinant is its compatibility with matrix 
multiplication. 


Theorem 1.4.9 Multiplicative Property of the Determinant. For any n Xn matrices A and B, 
det (AB) = (det A)(det B). 


The next theorem gives additional properties that are implied by those listed in (1.4.7). 


Theorem 1.4.10 Let 6 be a function on n Xn matrices that has the properties (1.4.7) (i,ii,iii). 

Then 

(a) If A’ is obtained from A by adding a multiple of (row /) of A to (row i) andi+ /, then 
5(A’) = 5(A). 

(b) If A’ is obtained by interchanging (row 7) and (row j) of A andi#/, then 
5(A’) = -8(A). 

(c) If A’ is obtained from A by multiplying (row i) by a scalar c, then 5(A’) = cd(A). 
If a row of a matrix A is equal to zero, then 6(A) = 0. 

(d) If (row i) of A is equal to a multiple of (row j) and i# j, then 6(A) =0. 


We now proceed to prove the three theorems stated above, in reverse order. The fact 
that there are quite a few points to be examined makes the proofs lengthy. This can’t be 
helped. 


Proof of Theorem 1.4.10. The first assertion of (c) is a part of linearity in rows (1.4.7)(ii). 
The second assertion of (c) follows, because a row that is zero can be multiplied by 0 without 


changing the matrix, and it multiplies 5(A) by 0. 


Next, we verify properties (a),(b),(d) when i and j are adjacent indices, say 7 = i+1.To 
simplify our display, we represent the matrices schematically, denoting the rows in question 


by R = (row i) and S = (row j), and suppressing notation for the other rows. So is 


denotes our given matrix A. Then by linearity in the ith row, 


(1.4.11) ead ees Bi 


The first term on the right side is 6(A), and the second is zero (1.4.7). This proves (a) for 
adjacent indices. To verify (b) for adjacent indices, we use (a) repeatedly. Denoting the rows 
by Rand S as before: 
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(1.4.12) Fy af o5]a3 Letet a oe be ]=8| g]-~ Bi 


Finally, (d) for adjacent indices follows from (c) and (1.4.7)(iii). 


To complete the proof, we verify (a),(b),(d) for an arbitrary pair of distinct indices. 
Suppose that (row i) is a multiple of (row j). We switch adjacent rows a few times to obtain 
a matrix A’ in which the two rows in question are adjacent. Then (d) for adjacent rows tells 
us that 6(A’) = 0, and (b) for adjacent rows tells us that 6(A’) = + 6(A). So 6(A) = 0, and 
this proves (d). At this point, the proofs of that we have given for (a) and (b) in the case of 
adjacent indices carry over to an arbitrary pair of indices. oO 


The rules (1.4.10)(a),(b),(c) show how multiplication by an elementary matrix affects 
6, and they lead to the next corollary. 


Corollary 1.4.13 Let 5 be a function on ” Xn matrices with the properties (1.4.7), and let 
be an elementary matrix. For any matrix A, 5(EA) = 5(£)8(A). Moreover, 


(i) If £ is of the first kind (add a multiple of one row to another), then 6(E) = 1. 
(ii) If E is of the second kind (row interchange), then 6(£) = -1. 
(iii) If Eisof the third kind (multiply a row by c), then 6(E) = c. 


Proof. The rules (1.4.10)(a),(b),(c) describe the effect of an elementary row operation on 
6(A), so they tell us how to compute 6(EA) from 6(A). They tell us that 6(EA) = € 5(A), 
where € = 1,-1, orc according to the type of elementary matrix. By setting A = J, we find 
that 6(£) = 6(ED = ed) =e. O 


Proof of the multiplicative property, Theorem 1.4.9. We imagine the first step of a row re- 
duction of A, say EA = A’. Suppose we have shown that 6(A’B) = 5(A’)5(B). We apply 
Corollary 1.4.13: 6(E)8(A) = 6(A’). Since A'B = E(AB) the corollary also tells us that 
5(A’B) = 5(E)3(AB). Thus 


5(E)5(AB) = 8(A’B) = 5(A’')5(B) = 8(E)8(A)S(B). 


Canceling 5(£), we see that the multiplicative property is true for A and B as well. This being 
so, induction shows that it suffices to prove the multiplicative property after row-reducing 
A. So we may suppose that A is row reduced. Then A is either the identity, or else its bottom 
row is zero. The property is obvious when A = /. If the bottom row of A is zero, so is the 
bottom row of AB, and Theorem 1.4.10 shows that 6(A) = 5(AB) = 0. The property is true 
in this case as well. O 


Proof of uniqueness of the determinant, Theorem 1.4.7. There are two parts. To prove unique- 
ness, we perform row reduction on a matrix A, say A’ = Ey --- £,A. Corollary 1.4.13 tells us 
how to compute 6(A) from 6(A’). If A’ is the identity, then 6(A’) = 1. Otherwise the bottom 
row of A’ is zero, and in that case Theorem 1.4.10 shows that 6(A‘’) = 0. This determines 
6(A) in both cases. 
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Note: It is a natural idea to try defining determinants using compatibility with multiplication 
and Corollary 1.4.13. Since we can write an invertible matrix as a product of elementary 
matrices, these properties determine the determinant of every invertible matrix. But there 
are many ways to write a given matrix as such a product. Without going through some steps 
as we have, it won’t be clear that two such products will give the same answer. It isn’t easy 
to make this idea work. 


To complete the proof of Theorem 1.4.7. we must show that the determinant function 
(1.4.5) we have defined has the properties (1.4.7). This is done by induction on the size of the 
matrices. We note that the properties (1.4.7) are true when 1 = 1, in which case det [a] = a. 
So we assume that they have been proved for determinants of (m — 1) X(m — 1) matrices. 
Then all of the properties (1.4.7), (1.4.10), (1.4.13), and (1.4.9) are true for (7 — 1) x(n — 1) 
matrices. We proceed to verify (1.4.7) for the function 6 = det defined by (1.4.5), and for 
n Xn matrices. For reference, they are: 


(i) With J denoting the identity matrix, det (/) = 1. 
(ii) det is linear in the rows of the matrix A. 
(iii) If two adjacent rows of a matrix A are equal, then det (A) = 0. 


(i) If A = J,, then ay, = 1 and a,; = 0 when v > 1. The expansion (1.4.5) reduces 
to det(A) = 1 det (Aj). Moreover, Aj; = J,-1, so by induction, det (Ay) = 1 and 
det (7,) = 1. 

(ii) To prove linearity in the rows, we return to the notation introduced in (1.4.8). We show 
linearity of each of the terms in the expansion (1.4.5), i-e., that 


(1.4.14) dy det (Dy) = ca, det (Ay) +c! by) det (By) 


for every index v. Let & be as in (1.4.8). 


Case I: v = k. The row that we operate on has been deleted from the minors A4,, Bx,, Dx, so 
they are equal, and the values of det on them are equal too. On the other hand, ag1, by, dy 
are the first entries of the rows Az, By, Dx. respectively. So dy = cay, +c’ by, and (1.4.14) 
follows. 


Case 2; v%k. If we let Aj, B,., D,. denote the vectors obtained from the rows Ax, By, Dx, 
respectively, by dropping the first entry, then A), is a row of the minor Ay, etc. Here 
D=e A,, + c’ B,, and by induction on n, det (D{,,) = cdet (A’,,) + c’ det (Bi). On the 
other hand, since v¥ k, the coefficients a,;, 0,1, dy are equal. So (1.4.14) is true in this case 
as well. 


(iii) Suppose that rows k and k + 1 of a matrix A are equal. Unless v = k or k + 1, the minor 
Ay; has two rows equal, and its determinant is zero by induction. Therefore, at most two 
terms in (1.4.5) are different from zero. On the other hand, deleting either of the equal rows 
gives us the same matrix. So ag, = @x441 and Agy = Ag4i4- Then 


det (A) = tag, det (Agi) Fagy11 det (Anyi) = 9. 


This completes the proof of Theorem 1.4.7. 0 
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Corollary 1.4.15 

(a) A square matrix A is invertible if and only if its determinant is different from zero. If A 
is invertible, then det (A7!) = (det A)"!. 

(b) The determinant of a matrix A is equal to the determinant of its transpose A'. 


(c) Properties (1.4.7) and (1.4.10) continue to hold if the word row is replaced by the word 
column throughout. 


Proof. (a) If A is invertible, then it is a product of elementary matrices, say A = E, --- E; 
(1.2.16). Then det A = (det £))--- (det E,). The determinants of elementary matrices are 
nonzero (1.4.13), so det A is nonzero too. If A is not invertible, there are elementary matrices 
E,,..., £,such that the bottom row of A’ = E£, --- E;A is zero (1.2.15). Then det A’ = 0, and 
det A = 0 as well. If A is invertible, then det(A~!)det A = det(A_! A) = det / = 1, therefore 
det (A7!) = (det A) 7}. 

(b) It is easy to check that det E = det £’ if E is an elementary matrix. If A is invertible, 
we write A = E, --- Ex as before. Then A’ = E),--- E}, and by the multiplicative property, 
det A = det A’. If A is not invertible, neither is A’. Then both det A and det A’ are zero. 


(c) This follows from (b). O 


1.5 PERMUTATIONS 

A permutation of a set Sis a bijective map p from aset S to itself: 
(1.5.1) p:S-> 8S. 

The table 


i 123 45 


(1.5.2) 
p® |3 5412 


exhibits a permutation p of the set {1, 2, 3, 4, 5} of five indices: p(1) = 3, etc. It is bijective 
because every index appears exactly once in the bottom row. 

The set of all permutations of the indices {1, 2, ... , n} is called the symmetric group, 
and is denoted by Sp. It will be discussed in Chapter 2. 

The benefit of this definition of a permutation is that it permits composition of 
permutations to be defined as composition of functions. If g is another permutation, then 
doing first p then g means composing the functions: go p. The composition is called the 
product permutation, and will be denoted by qp. 


Note: People sometimes like to think of a permutation of the indices 1, ..., mas a list of 
the same indices in a different order, as in the bottom row of (1.5.2). This is not good for 
us. In mathematics one wants to keep track of what happens when one performs two or 
more permutations in succession. For instance, we may want to obtain a permutation by 
repeatedly switching pairs of indices. Then unless things are written carefully, keeping track 
of what has been done becomes a nightmare. Oo 


The tabular form shown above is cumbersome. It is more common to use cycle notation. 
To write a cycle notation for the permutation p shown above, we begin with an arbitrary 
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index, say 3, and follow it along: p(3) = 4, p(4) = 1, and p()) = 3. The string of three 
indices forms a cycle for the permutation, which is denoted by 


(1.5.3) (341). 


This notation is interpreted as follows: the index 3 is sent to 4, the index 4 is sent to 1, and 
the parenthesis at the end indicates that the index 1 is sent back to 3 at the front by the 


permutation: 
oN 


4 
\3 
Because there are three indices, this is a 3-cycle. 

Also, p(2) = 5 and p(5) = 2, so withthe analogous notation, the two indices 2, 5 form 
a 2-cycle (25). 2-cycles are called transpositions. 


The complete cycle notation for p is obtained by writing these cycles one after the 
other: 


(1.5.4) p = (341) (25). 


The permutation can be read off easily from this notation. 
One slight complication is that the cycle notation isn’t unique, for two reasons. First, 
we might have started with an index different from 3. Thus 


(341), (134) and (413) 


are notations for the same 3-cycle. Second, the order in which the cycles are written doesn’t 
matter. Cycles made up of disjoint sets of indices can be written in any order. We might just 


as well write 
p = (52) (134). 


The indices (which are 1, 2, 3, 4, 5 here) may be grouped into cycles arbitrarily, and the 
result will be a cycle notation for some permutation. For example, (3 4)(2)(15) represents 
the permutation that switches two pairs of indices, while fixing 2. However, 1-cycles, the 
indices that are left fixed, are often omitted from the cycle notation. We might write this 
permutation as (3 4)(15). The 4-cycle 


(1.5.5) q = (1452) 


is interpreted as meaning that the missing index 3 is left fixed. Then in a cycle notation for a 
permutation, every index appears at most once. (Of course this convention assumes that the 
set of indices is known.) The one exception to this rule is for the identity permutation. We’d 
rather not use the empty symbol to denote this permutation, so we denote it by 1. 

To compute the product permutation gp, with p and g as above, we follow the indices 
through the two permutations, but we must remember that gp means qo p, “‘first do p, then 
q.” So since p sends 3 — 4 and gq sends 4 — 5, gp sends 3 — 5. Unfortunately, we read 
cycles from left to right, but we have to run through the permutations from right to left, in a 
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zig-zag fashion. This takes some getting used to, but in the end it is not difficult. The result 
in our case is a 3-cycle: 


then this first do this 
qp = [1452)].[(341(25)] = (135), 


the missing indices 2 and 4 being left fixed. On the other hand, 
pq = (234). 


Composition of permutations is not a commutative operation. 


There is a permutation matrix P associated to any permutation p. Left multiplication 
by this permutation matrix permutes the entries of a vector X using the permutation p. 

For example, if there are three indices, the matrix P associated to the cyclic permutation 
p = (123) and its operation on a column vector are as follows: 


0 0 1 xX] X3 
(1.5.6) PX=|1 0 0 x2) =] x1 
01 0 x3 x2 


Multiplication by P shifts the first entry of the vector X to the second position and so on. 

It is essential to write the matrix of an arbitrary permutation down carefully, and to 
check that the matrix associated to a product pq of permutations is the product matrix PQ. 
The matrix associated to a transposition (25) is an elementary matrix of the second type, 
the one that interchanges the two corresponding rows. This is easy to see. But for a general 
permutation, determining the matrix can be confusing. 


¢ To write a permutation matrix explicitly, it is best to use the n Xn matrix units e; ;, the 
matrices with a single 1 in the i, j position that were defined before (1.1.21). The matrix 
associated to a permutation p of Sy, is 


(1.5.7) SAP pi is 


(In order to make the subscript as compact as possible, we have written pi for p(i).) 


This matrix acts on the vector ¥ = }° e;x; as follows: 


(1.5.8) PX= be epii)( ) ei) = > pie xj= > epi ieiXi = Se epity 
i j i,j i i 


This computation is made using formula (1.1.25). The terms ep;,;¢; in the double sum are 
zero when i+ j. 

To express the right side of (1.5.8) as a column vector, we have to reindex so that the 
standard basis vectors on the right are in the correct order, e;,...,@, rather than in the 
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permuted order @p1,..., @pn. We set pi =k andi = pk. Then 
(1.5.9) Se epixr= Yee iy: 
i k 


This is a confusing point: Permuting the entries x; of a vector by p permutes the 
indices by p™!. 
For example, the 3X3 matrix P of (1.5.6) is eo; + e32 + e13, and 


PX = (€21 + €32 + €13)(€1%1 + €2X2 + €3X3) = €1X3 + 2X1 + €3X2. 


Proposition 1.5.10 


(a) A permutation matrix P always has a single 1 in each row and in each column, the rest 
of its entries being 0. Conversely, any such matrix is a permutation matrix. 

(b) The determinant of a permutation matrix is +1. 

(c) Let p and q be two permutations, with associated permutation matrices P and Q. The 
matrix associated to the permutation pq is the product PQ. 


Proof. We omit the verification of (a) and (b). The computation below proves (c): 


BO (x epi.) (x ea) = >> pit Cand = >, epaiai lai.i = > epaii- 
i inj j j 


J 


This computation is made using formula (1.1.23). The terms e pj,;@q;,; in the double sum are 
zero unless i = qj. So PQ is the permutation matrix associated to the product permutation 
Pq, as claimed. O 


e The determinant of the permutation matrix associated to a permutation p is called the 
sign of the permutation : 


(1.5.11) signp = det P = +1. 


A permutation p is even if its sign is +1, and odd if its sign is-1. The permutation (123) has 
sign +1. It is even, while any transposition, such as (12), has sign -1 and is odd. 

Every permutation can be written as a product of transpositions in many ways. If a 
permutation p is equal to the product t, ---t%, where 1; are transpositions, the number k 
will always be even if p is an even permutation and it will always be odd if p is an odd 
permutation. 

This completes our discussion of permutations and permutation matrices. We will come 
back to them in Chapters 7 and 10. 


1.6 OTHER FORMULAS FOR THE DETERMINANT 


There are formulas analogous to our definition (1.4.5) of the determinant that use expansions 
by minors on other columns of a matrix, and also ones that use expansions on rows. 
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Again, the notation A; ; stands for the matrix obtained by deleting the ith row and the 
jth column of a matrix A. 


Expansion by minors on the jth column: 
detA = (-1)' ay; det Ay; + (-1)?*ap; det Agj+---+ (-1)" Fan; det Anj;, 
or in summation notation, 
zn . 
(1.6.1) det A = )°(-1)”/ay; det Ayj. 
v=1 


Expansion by minors on the ith row: 


detA = (-1)'t1a;,det Aj; + (-1)!+*a;2det Aj> feeet (-1)'t"a;,det Ain, 


n 
(1.6.2) detA = )°(-1)'t’ajydet Ain. 


v=1 


For example, expansion on the second row gives 
11 2 
det] 0 2 1 =-0det | j 3 | +200 || 5 | - 140 | j |= 
102 2 1 2 1 0 


To verify that these formulas yield the determinant, one can check the properties (1.4.7). 
The alternating signs that appear in the formulas can be read off of this figure: 


+14 
i+ | 
+ 1+ 


(1.6.3) 


The notation (-1)'+/ for the alternating sign may seem pedantic, and harder to remember 
than the figure. However, it is useful because it can be manipulated by the rules of algebra. 


We describe one more expression for the determinant, the complete expansion. The 
complete expansion is obtained by using linearity to expand on all the rows, first on (row 1), 
then on (row 2), and so on. For a 2X2 matrix, this expansion is made as follows: 


a b 1 0 0 1 
act |¢ a =adet le a] +baet |? i 
_ 1 0 1 0 01 01 
= acdet| | p | + ad det E {| + be ae E 5 | +a ae k HF 
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The first and fourth terms in the final expansion are zero, and 


det E 5 = ad det E "] + be det lt | = ad — be. 


Carrying this out for n Xn matrices leads to the complete expansion of the determinant, 
the formula 


(1.6.4) detA = }° (sign p) a1, p1---Gn,pn, 
permp 


in which the sum is over all permutations of the 1 indices, and (sign p) is the sign of the 
permutation. 

For a 2X2 matrix, the complete expansion gives us back Formula (1.4.2). For a 3X3 
matrix, the complete expansion has six terms, because there are six permutations of three 
indices: 


(1.6.5) detA = 
411422433 + 412423431 + A13421A32 — A,1A23A32 ~ A12A21A33 ~- A13422031. 


As an aid for remembering this expansion, one can display the block matrix [A|A]: 


Qj} Q@j2 A213 a1, Aj2 Qj3 
S N\N «K a 4 
(1.6.6) 421 422 423 421 422 423 
x x x 
431 432 433 31 432 433 


The three terms with positive signs are the products of the terms along the three diagonals 
that go downward from left to right, and the three terms with negative signs are the products 
of terms on the diagonals that go downward from right to left. 


Warning: The analogous method will not work with 4x4 determinants. 


The complete expansion is more of theoretical than of practical importance. Unless 
n is small or the matrix is very special, it has too many terms to be useful for com- 
putation. Its theoretical importance comes from the fact that determinants are exhibited 
as polynomials in the n? variable matrix entries a; j with coefficients + 1. For example, 
if each matrix entry a;; is a differentiable function of a variable ¢, then because sums 
and products of differentiable functions are differentiable, det A is also a differentiable 
function of ¢. 


The Cofactor Matrix 
The cofactor matrix of ann Xn matrix A is the n Xn matrix cof(A) whose i, j entry is 


(1.6.7) cof(A);; = 1)! det Aji, 
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where, as before, Aj; is the matrix obtained by crossing out the jth row and the ith column. 
So the cofactor matrix is the trans pose of the matrix made up of the (m — 1) X(n — 1) minors 
of A, with signs as in (1.6.3). This matrix is used to provide a formula for the inverse matrix. 

If you need to compute a cofactor matrix, it is safest to make the computation in three 
steps: First compute the matrix whose i, j entry is the minor det A;;, then adjust signs, and 
finally transpose. Here is the computation for a particular 3 X3 matrix: 


(1.6.8) 
i 4 34 rs ee rn Oe) 420.3 
AOD A | eo Ore ds Se Oy AA PO 2 ete): 
{0:2 <b ee a ae ee 


Theorem 1.6.9 Let A be an nm Xn matrix, let C = cof(A) be its cofactor matrix, and let 
a = det A. If a0, then A is invertible, and A~! = a7 'C. In any case, CA = AC = al. 


Here a/ is the diagonal matrix with diagonal entries equal to a. For the inverse of a 2X2 
matrix, the theorem gives us back Formula 1.1.17. The determinant of the 3x3 matrix A 
whose cofactor matrix is computed in (1.6.8) above happens to be 1, so for that matrix, 
A7! =cof(A). 


Proof of Theorem 1.6.9. We show that the i, j entry of the product CA is equal to aw ifi = j 
and is zero otherwise. Let A; denote the ith column of A. Denoting the entries of C and A 
by c;; and a;;, thei, j entry of the product CA is 


(1.6.10) Y chau = FGI) detAvay; 
v Vv 


When i = f, this is the formula (1.6.1) for the determinant by expansion by minors on 
column j. So the diagonal entries of CA are equal to q, as claimed. 

Suppose that i+ 7. We form a new matrix M in the following way: The entries of M are 
equal to the entries of A, except for those in column i. The ith column M; of M is equal to 
the jth column A; of A. Thus the ith and the jth columns of M are both equal to A,;, and 
det M = 0. 

Let D be the cofactor matrix of M, with entries d; j. The t, i entry of DM is 


ys diymy = S°C1)’ "det Myjmyj. 
v v 


This sum is equal to det M, which is zero. 

On the other hand, since the ith column of M is crossed out when forming M,,, that 
minor is equal to A,;. And since the ith column of M is equal to the jth column of A, 
My; = Ayj. So the i, i entry of DMis also equal to 


DECI)" det Aviav;, 
v 
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which is the i, j entry of CA that we want to determine. Therefore the i, j entry of CA is 
zero, and CA = ai, as claimed. It follows that A~! = a7! cof(A) if a 40. The computation 
of the product AC is done in a similar way, using expansion by minors on rows. O 


A general algebraical determinant in its developed form 

may be likened to a mixture of liquids seemingly homogeneous, 

but which, being of differing boiling points, admit of being separated 
by the process of fractional distillation. 


—James Joseph Sylvester 


EXERCISES 
Section 1 The Basic Operations 
12 5 
1.1. What are the entries az,, and a3 of the matrix A=]|2 7 8 |? 
09 4 


1.2. Determine the products AB and BA for the following values of A and B: 


-§ -4 
109-37 gts re es ee 
A=[3 3 i} e-[3 3} Axi; Beals a 


by 


1.3, Let A = [a; ---a,] be a row vector, andlet B=] : | be acolumn vector. Compute 
the products AB and BA. bn 


{21022714 
1.4. Verify the associative law for the matrix product E 1 k 1 3 4}. 
3 


Note: This is a self-checking problem. It won’t come out unless you multiply correctly. If 
you need to practice matrix multiplication, use this problem as a model. 


1.5. 3Let A, B, and C be matrices of sizes £m, m Xn, and n X p. How many multiplications 
are required to compute the product AB? In which order should the triple product ABC 
be computed, so as to minimize the number of multiplications required? 


n 
1.6. Compute |! all tf ana? al 


n 


1 
1.7. Find a formula for 1 , and prove it by induction. 


ronan 


3Suggested by Gilbert Strang. 
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1.8. Compute the following products by block multiplication: 


i Ata SAO 
01/0 1/jo 1/01 oa ee oe ee 
EOF Od (OO tale a 
0 1/1 ojo 141 3 


1.9. Let A, B be square matrices. 
(a) When is (A + B)(A — B) = A? — B?? (b) Expand (A + B)?. 
1.10. Let D be the diagonal matrix with diagonal entries dj, ...,d,, and let A = (a;;) be an 
arbitrary n Xn matrix. Compute the products DA and AD. 


1.11. Prove that the product of upper triangular matrices is upper triangular. 
1.12. In each case, find all 2X2 matrices that commute with the given matrix. 


1 0 0 1 2 0 1 3 2 3 
1.13. A square matrix A is nilpotent if AX = 0 for some k > ©. Prove that if A is nilpotent, then 


I +A is invertible. Do this by finding the inverse. 
1.14. Find infinitely many matrices B such that BA = /7 when 


2 3 
A=]1 2], 
1 1 


and prove that there is no matrix C such that AC = /3. 
1.15. With A arbitrary, determine the products e;;A, Aej;, e;Aex, ej:Ae;j;, and e; ;Aexe. 


Section 2 Row Reduction 


‘2.1. For the reduction of the matrix M (1.2.8) given in the text, determine the elementary 
matrices corresponding to each operation. Compute the product P of these elementary 
matrices and verify that PM is indeed the end result. 


2.2. Find all solutions of the system of equations AX = B when 


1 2 11 0 1 0 
A=/!3 0 0 4 and B= (a)| 0], (b)/ 1). (©)} 2]. 
1 -4 2 2 0 0 2 


2.3. Find all solutions of the equation x; + x2 + 2x3 — x4 =3. 


2.4. Determine the elementary matrices used in the row reduction in Example (1.2.18), and 
verify that their product is A7}. 


25. 


2.6. 


2.7. 


2.8. 
2.9. 


2.10. 
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Find inverses of the following matrices: 


eT} [ah TY ahs “Ut al: 


The matrix below is based on the Pascal triangle. Find its inverse. 


bet eek he pe 
BWNe 
Dawe 
pe 

pe 


Make a sketch showing the effect of multiplication by the matrix A = E z on 
the plane R?. ‘ 3 


Prove that if a product AB of nxn matrices is invertible, so are the factors A and B. 


Consider an arbitrary system of linear equations AX = B, where A and B are real 
matrices. 


(a) Prove that if the system of equations AX = B has more than one solution then it has 
infinitely many. 

(b) Prove that if there is a solution in the complex numbers then there is also a real 
solution. 


Let A be a square matrix. Show thatif the system AX = B has a unique solution for some 
particular column vector B, then it has a unique solution for all B. 


Section 3 The Matrix Transpose 


3.1. 


3.2. 


3.3. 


3.4. 


A matrix B is symmetric if B = B'. Prove that for any square matrices B, BB' and B + B' 
are symmetric, and that if A is invertible, then (A7!)' = (A‘)71. 


Let A and B be symmetric n Xn matrices. Prove that the product AB is symmetric if and 
only if AB = BA. 


Suppose we make first a row operation, and then a column operation, on a matrix A. 
Explain what happens if we switch the order of these operations, making the column 
operation first, followed by the row operation. 


How much can a matrix be simplified if both row and column operations are allowed? 


Section 4 Determinants 


4.1. 


Evaluate the following determinants: 


1000 

2°04 
1 2 1 1 a 20 
0974 
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4.2, (self-checking) Verify the rule det AB = (det A) (det B) for the matrices 


2 3 1 1 
a=|? A and a=[5 5 


4.3. Compute the determinant of the following 7 Xn matrix using induction on 7: 


2-1 
-1 2-1 
-l 2-1 
-1 
2 -1 
-1 2 


4.4. Let A be an n Xn matrix. Determine det (-A) in terms of det A. 
4.5. Use row reduction to prove that det A' = det A. 
4.6. Prove that det E 5 = (detA)(det D), if A and D are square blocks. 
Section S Permutation Matrices 
§.1. Write the following permutations as products of disjoint cycles: 
(12)(13)(14) (15), (123)(23.4)(345), (1234)(2345), (12)(23)(34)(45) (51), 
5.2. Let p be the permutation (1342) of four indices. 
(a) Find the associated permutation matrix P. 


(b) Write p as a product of transpositions and evaluate the corresponding matrix product. 
(c) Determine the sign of p. 


5.3. Prove that the inverse of a permutation matrix P is its transpose. 


5.4. What is the permutation matrix associated to the permutation of 1 indices defined by 
p(i) =n —i+1? What is the cycle decomposition of p? What is its sign? 


5.5. In the text, the products gp and pq of the permutations (1.5.2) and (1.5.5) were seen to 
be different. However, both products turned out to be 3-cycles. Is this an accident? 
Section6 Other Formulas for the Determinant 


6.1. (a) Compute the determinants of the following matrices by expansion on the bottom 


row: 
12 11 2 4 -1 1 abe 
E al 2 4 24, 1 1 -2], 1 0 1/4]. 
02 1 pk cede) “ih 111 


(b) Compute the determinants of these matrices using the complete expansion. 
(c) Compute the cofactor matrices of these matrices, and verify Theorem 1.6.9 
for them. 
6.2. Let A be ann Xn matrix with integer entries a; ;. Prove that A is invertible, and that its 
inverse A”! has integer entries, if and only ifdetA = +1. 
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Miscellaneous Problems 


*M.1L. 


M.2. 


M.3. 


M.4. 


M.S. 


M.6. 


M.7. 


*M.L8. 


M.9, 


A B 
co} 
matrix. Suppose that A is invertible and that AC = CA. Use block multiplication to prove 
that det M = det (AD — CB). Give an example to show that this formula need not hold if 
AC#CA. 


Let A be an m Xn matrix with m < n. Prove that A has no left inverse by comparing A 
to the square n Xn matrix obtained by adding (m — m) rows of zeros at the bottom. 


Let a 2m X2n matrix be given in the form M = where each block is ann xn 


The trace of a square matrix is the sum of its diagonal entries: 
trace A = aj} +22 +---+4nn, 


Show that trace (A + B) = trace A + trace B, that trace AB = trace BA, and that if B is 
invertible, then trace A = trace BAB™!. 


Show that the equation AB — BA = I has no solution in real nm Xn matrices A and B. 


Write the matrix E ‘| as a product of clementary matrices, using as few as you can, 


and prove that your expression is as short as possible. 


Determine the smallest integer n such that every invertible 2 x2 matrix can be written as 
a product of at most n elementary matrices. 


(Vandermonde determinant) 


1 1 1 
(a) Provethatdet|a b c |=(a-—b)(b—c)(c—a). 
a bh 


(b) Prove an analogous formula for n Xn matrices, using appropriate row operations to 
clear out the first column. 

(c) Use the Vandermonde determinant to prove that there is a unique polynomial p(t) 
of degree n that takes arbitrary prescribed values at + 1 points to, ..., tn. 


(an exercise in logic) Consider a general system AX = B of m linear equations in n 
unknowns, where m and v are not necessarily equal. The coefficient matrix A may have 
aleft inverse L, a matrix such that LA = /,. If so, we may try to solve the system as we 
learn to do in school: 
AX =B, LAX =LB, X =LB. 

But when we try to check our work by running the solution backward, we run into trouble: 
If X = LB, then AX = ALB. We seem to want L to be a right inverse, which isn’t what 
was given. 


(a) Work some examples to convince yourself that there is a problem here. 
(b) Exactly what does the sequence of steps made above show? What would the existence 
of a right inverse show? Explain clearly. 


Let A bea real 2X2 matrix, and let A;, A> be the columns of A. Let P be the parallelogram 
whose vertices are 0, Ay, Az, Aj + Az. Determine the effect of elementary row operations 
on the area of P, and use this to prove that the absolute value |det A| of the determinant 
of A is equal to the area of P. 
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*M.10. 


M.11. 


Let A, B be mXn and nXm matrices. Prove that J,, — AB is invertible if and only if 
I, — BAis invertible. 

Hint: Perhaps the only approach available to you at this time is to find an explicit 
expression for one inverse in terms of the other. As a heuristic tool, you could try 
substituting into the power series expansion for (1 — x)~!. The substitution will make no 
sense unless some series converge, and this needn’t be the case. But any way to guess a 
formula is permissible, provided that you check your guess afterward. 


‘(discrete Dirichlet problem) A function f(u, v) is harmonic if it satisfies the Laplace 


equation ef ef = 0. The Dirichlet problem asks for a harmonic function on a plane 
region R with prescribed values on the boundary. This exercise solves the discrete version 
of the Dirichlet problem. 

Let f be a real valued function whose domain of definition is the set of integers Z. To 
avoid asymmetry, the discrete derivative is defined on the shifted integers Z + 1 as the 
first difference f’(n + 4) = f(n+1) — fm). The discrete second derivative is back on 
the integers: f’(n) = f’(n+ ») - f'n- }) = f(n+1)-2f(n) + ftn-1). 

Let f(u, v) be a function whose domain is the lattice of points in the plane with integer 
coordinates. The formula for the discrete second derivative shows that the discrete version 
of the Laplace equation for f is 


fut+l,v)+ fu-1,v) + fu,v+)+ fu.v-) -—4f, v) =0. 


So f is harmonic if its value at a point (uw, v) is the average of the values at its four 
neighbors. 

A discrete region R in the plane is a finite set of integer lattice points. Its boundary 
OR is the set of lattice points that are not in R, but which are at a distance 1 from some 
point of R. We'll call R the interior of the region R = RU OR. Suppose that a function 
B is given on the boundary dR. The discrete Dirichlet problem asks for a function f 
defined on R, that is equal to B on the boundary, and that satisfies the discrete Laplace 
equation at all points in the interior. This problem leads to a system of linear equations 
that we abbreviate as LX = B. To set the system up, we write 8,1, for the given value 
of the function £ at a boundary point. So f(u, v) = Byy at a boundary point (u, v). Let 
Xuy denote the unknown value of the function f(u, v) at a point (u, v) of R. We order 
the points of R arbitrarily and assemble the unknowns x,y into a column vector X . The 
coefficient matrix L expresses the discrete Laplace equation, except that when a point 
of R has some neighbors on the boundary, the corresponding terms will be the given 
boundary values. These terms are moved to the other side of the equation to form the 
vector B. 


(a) When R is the set of five points (0, 0), (0, +1), (+1, 0), there are eight boundary 
points. Write down the system of linear equations in this case, and solve the Dirichlet 
problem when f is the function on dR defined by Byy = Oif v < O and f,, = lif 
v>0. 

(b) The maximum principle states that a harmonic function takes on its maximal value 
on the boundary. Prove the maximum principle for discrete harmonic functions. 

(c) Prove that the discrete Dirichlet problem has a unique solution for every region R 
and every boundary function £. 


41 learned this problem from Peter Lax, whotold me that he had learned it from my father, Emil Artin. 


CHAPTER 2 


Groups 


Il est peu de notions en mathématiques qui soient plus primitives 
que celle de loi de composition. 


—Nicolas Bourbaki 


2.1 LAWS OF COMPOSITION 


A law of composition on aset S is any rule for combining pairs a, b of elements of S to get 

another element, say p, of S. Some models for this concept are addition and multiplication 

of real numbers. Matrix multiplication on the set of nm Xn matrices is another example. 
Formally, a law of composition is a function of two variables, or a map 


SxXS— S. 


Here S x S denotes, as always, the product set, whose elements are pairs a, b of elements 
of S. 

The element obtained by applying the law to a pair a, b is usually written using a 
notation resembling one used for multiplication or addition: 


p=ab, aXb, acb, a+b, 


or whatever, a choice being made for the particular law in question. The element p may be 
called the product or the sum of a and b, depending on the notation chosen. 

We will use the product notation ab most of the time. Anything done with product 
notation can be rewritten using another notation such as addition, and it will continue to be 
valid. The rewriting is just a change of notation. 

It is important to note right away that ab stands for acertain element of S, namely for 
the element obtained by applying the given law to the elements denoted by a and b. Thus 


2 1 


the matrix p a Once the product ab has been evaluated, the elements a and b cannot 


be recovered from it. 
With multiplicative notation, alaw of composition is associative if the rule 


if the law is matrix multiplication and if a = ; | and b = E Hf , then ab denotes 


(2.1.1) (ab)c = a(bc) (associative law) 
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holds for all a, b, c in S, where (ab)c means first multiply (apply the law to) a and b, then 
multiply the result ab by c. A law of composition is commutative if 


(2.1.2) ab=ba_ (commutative law) 


holds for all a and b in S. Matrix multiplication is associative, but not commutative. 

It is customary to reserve additive notation a + b for commutative laws — laws such 
that a + b = b + a for all a and b. Multiplicative notation carries no implication either way 
concerning commutativity. 

The associative law is more fundamental than the commutative law, and one reason for 
this is that composition of functions is associative. Let T be a set, and let g and f be maps 
(or functions) from T to T. Let go f denote the composed map ft ~ g( f(f)): first apply f, 
then g. The rule 

8. f-Rof 


is a law of composition on the set of maps T > 7. This law is associative. If f, g, and A are 
three maps from T to T, then (ho g)o f=ho(go f): 
ho 8 
ok See 
T—— TT T. 
Tete see 
gof 
Both of the composed maps send an element ¢ to h(g(f(4))). 
When 7 contains two elements, say T = {a, b}, there are four maps T > T: 
i: the identity map, defined by i(a) = a, i(b) = b; 
tT: the transposition, defined by t(a) = b, t(b) = a; 
a: the constant function a(a) = a(b) =a; 
B: the constant function B(a) = B(b) = b. 


The law of composition on the set {i, t,a@, 8} of maps T — T can be exhibited in a 
multiplication table: 


| ita Bp 
i ita Bp 
(2.1.3) T tT i Ba, 
a aaaa 
B\B BB B 
which is to be read in this way: 
f 
i . 
2 .. goof 


Thus t c@ = B, while a o T = a. Composition of functions is not a commutative law. 
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Going back to a general law of composition, suppose we want to define the product of 
a string of n elements of a set: @,a@2---a, = ? There are various ways to do this using the 
given law, which tells us how to multiply two elements. For instance, we could first use the 
law to find the product a,a@2, then multiply this element by a3, and so on: 


((aya2)a3)a4---. 
There are several other ways to form a product with the elements in the given order, but if 


the law is associative, then all of them yield the same element of S. This allows us to speak 
of the product of an arbitrary string of elements. 


Proposition 2.1.4 Let an associative law of composition be given on a set S. There is a 
unique way to define, for every integer n, a product of n elements ay, ..., @, of S, denoted 
temporarily by [a  ---a@,,], with the following properties: 


(i) The product [a] of one element is the element itself. 
(ii) The product [a;a2] of two elements is given by the law of composition. 
(iii) For any integer iin the range 1 <i<n, [a,---a,] =[a)---a;][ai41... an). 


The right side of equation (iii) means that the two products [a ...a;] and [aj41...@p] are 
formed first, and the results are then multiplied using the law of composition. 


Proof. We use induction on n. The product is defined by (i) and (ii) for n < 2, and it does 
satisfy (iii) when n = 2. Suppose that we have defined the product of r elements when 
r <n-— 1, and that it is the unique product satisfying (iii). We then define the product of n 
elements by the rule 


[ay --- an] = [a1 --- Ans] [An], 


where the terms on the right side are those already defined. If a product satisfying (iii) exists, 
then this formula gives the product because it is (iii) when i = n — 1. So if the product of n 
elements exists, it is unique. We must now check (ili) for i <n — 1: 


[a1 --- An] = [a1 -++an—1][an] (our definition) 
= ([@1 --- aj][ai41---@n-1])[a@n] (induction hypothesis) 
= [a1 ---a;]([ai41--+@n~1][an]) (associative law) 
(a) ---aj][@i41 --- an] (induction hypothesis). 


This completes the proof. We will drop the brackets from now on and denote the product by 
Q,+*-An. O 


An identity for a law of composition is an element e of S such that 


(2.1.5) ea=a and ae =a, forallainS. 


There can be at most one identity, for if e and e’ are two such elements, then since e is an 
identity, ee’ = e’, and since e’ is an identity, e = ee’. Thus e = ee’ =e’. 

Both matrix multiplication and composition of functions have an identity. For 1 Xn 
matrices it is the identity matrix /, and for the set of maps T — T it is the identity map — the 
map that carries each element of T to itself. 
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e The identity element will often be denoted by 1 if the law of composition is written 
multiplicatively, and by 0 if the law is written additively. These elements do not need to be 
related to the numbers 1 and 0, but they share the property of being identity elements for 
their laws of composition. 


Suppose that a law of composition on a set S, written multiplicatively, is associative 
and has an identity 1. An element a of S is invertible if there is another element b such that 


ab=1 and ba=1, 


and if so, then b is called the inverse of a. The inverse of an element is usually denoted by 
a_', or when additive notation is being used, by -a. 

We list without proof some elementary properties of inverses. All but the last have 
already been discussed for matrices. For an example that illustrates the last statement, see 
Exercise 1.3. 


¢ If an element a has both a left inverse £ and a right inverse r, i.e., if 2a = 1 and 
ar = 1, then 2 =r, ais invertible, r is its inverse. 

e If ais invertible, its inverse is unique. 

e Inverses multiply in the opposite order: If a and D are invertible, so is the product 
ab, and (ab)"! = ba!” 

e Anelement a may have a left inverse or a right inverse, though it is not invertible. 


Power notation may be used for an associative law: Withn > 0,a” = a---a (n factors), 
a" =a!...a',anda® = 1. The usual rules for manipulation of powers hold: a’a* = a”*$ 
and (a’)* = a’. When additive notation is used for the law of composition, the power 
notation a” is replaced by the notationna=a+.---+a. 

Fraction notation Z is not advisable unless the law of composition is commutative, 
because it isn’t clear from the notation whether the fraction stands for ba! or for a~!b, and 


these two elements may be different. 


2.2 GROUPS AND SUBGROUPS 
A group is a set G together with a law of composition that has the following properties: 


e The law of composition is associative: (ab)c = a(bc) for alla, b, cinG. 
¢ Gcontains anidentity element 1, such that la =a andal =a forallainG. 
¢ Every element a of G has an inverse, an element b such that ab = 1 and ba = 1. 


An abelian group is a group whose law of composition is commutative. 

For example, the set of nonzero real numbers forms an abelian group under multipli- 
cation, and the set of alt real numbers forms an abelian group under addition. The set of 
invertible m Xn matrices, the general linear group, is a very important group in which the 
law of composition is matrix multiplication. It is not abelian unless n = 1. 

When the law of composition is evident, itis customary to denote a group and the set 
of its elements by the same symbol. 

The order of a group G is the number of elements that it contains. We will often denote 
the order by |G|: 


(2.2.1) |G| = number of elements, the order, of G. 
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If the order is finite, G is said to be a finite group. If not, G is an infinite group. The same 
terminology is used for any set. The order |S| of a set S is the number of its elements. 
Here is our notation for some familiar infinite abelian groups: 


the set of integers, with addition asitslaw of composition 


(2.2.2) zs: — the additive group of integers, 
Rt: the set of real numbers, with addition as its law of 
: composition — the additive group of real numbers; 
R* the set of nonzero real numbers, with multiplication as 
; its law of composition — the multiplicative group, 
ct. Cc: the analogous groups, where the set C of complex num- 


bers replaces the set R of real numbers. 


Warning: Others might use the symbol R* to denote the set of positive real numbers. To 
be unambiguous, it might be better to denote the additive group of reals by (IR, +), thus 
displaying its law of composition explicitly. However, our notation is more compact. Also, 
the symbol R* denotes the multiplicative group of nonzero real numbers. The set of all real 
numbers is not a group under multiplication because 0 isn’t invertible. O 


Proposition 2.2.3 Cancellation Law. Let a, b, c be elements of a group G whose law of 
composition is written multiplicatively. If ab = ac or if ba = ca, then b = c. If ab = a or if 
ba =a, thenb=1. 


Proof. Multiply both sides of ab = ac on the left by a7! to obtain b = c. The other proofs 
are analogous, O 


Multiplication by a”! is essential for this proof. The Cancellation Law needn’t hold when 
the element a is not invertible. For instance, 


PE TP TP a} 


Two basic examples of groups are obtained from laws of composition that we have 
considered - multiplication of matrices and composition of functions — by leaving out the 
elements that are not invertible. 


e The n Xn general linear group is the group of all invertible n Xn matrices. It is denoted by 
(2.2.4) GLy = {n Xn invertible matrices A}. 


If we want to indicate that we are working with real or with complex matrices, we write 
GL, (R) or GL, (C), according to the case. 

Let M be the set of maps from a set 7 to itself. A map f:7 — T has an inverse 
function if and only if it is bijective, in which case we say f is a permutation of T. The 
permutations of T form a group, the law being composition of maps. As in section 1.5, we 
use multiplicative notation for the composition of permutations, writing gp for qo p. 


e The group of permutations of the set of indices {1,2, ..., n} is called the symmetric group, 
and is denoted by S,: 


(2.2.5) Sy is the group of permutations of the indices 1, 2, ..., n. 
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There are n! (‘n factorial’ = 1-2-3---m) permutations of a set of n elements. so the 
symmetric group S;, is a finite group of order n!. 

The permutations of a set {a, b} of two elements are the identity i and the transposition 
tT (see 2.1.3). They form a group of order two. If we replace a by 1 and b by 2, we see that 
this is the same group as the symmetric group S. There is essentially only one group G of 
order two. To see this, we note that one of its elements must be the identity 1; let the other 
element be g. The multiplication table for the group contains the four products 11, 1g, g1, 
and gg. All except gg are determined by the fact that 1 is the identity element. Moreover, 
the Cancellation Law shows that gg + g. The only possibility is gg = 1. So the multiplication 
table is completely determined. There is just one group law. 

We describe the symmetric group S3 next. This group, which has order six, serves 
as a convenient example because it is the smallest group whose law of composition isn’t 

_commutative. We will refer to it often. To describe it, we pick two particular permutations 
in terms of which we can write all others. We take the cyclic permutation (123), and the 
transposition (12), and label them as x and y, respectively. The rules 


(2.2.6) eal, ~=l, yr=x’*y 

are easy to verify. Using the cancellation law, one seesthat the six elements 1, x, x2, y, xy, xy 
are distinct. So they are the six elements of the group: 

(2.2.7) S3 = {1, x, x7; y, xy, x7y}. 


In the future, we will refer to (2.2.6) and (2.2.7) as our “usual presentation”’ of the symmetric 
group 53. Note that 53 is not a commutative group, because yx # xy. 

The rules (2.2.6) suffice for computation. Any product of the elements x and y and of 
their inverses can be shown to be equal to one of the products (2.2.7) by applying the rules 
repeatedly. To do so, we move all occurrences of y to the right side using the last rule, and 
we use the first two rules to keep the exponents small. For instance, 


(2.2.8) x ly x?y =x yxy =x? (yx)xy = 2x7 (x’y)xy = xyxy = x(x’y)y = L. 

One can write out a multiplication table for S3 with the aid of the rules (2.2.6), and because 
of this, those rules are called defining relations for the group. We study defining relations in 
Chapter 7. 

We stop here. The structure of S, becomes complicated very rapidly as n increases. 


One reason that the general linear groups and the symmetric groups are important is 
that many other groups are contained in them as subgroups. A subset H of a group G isa 
subgroup if it has the following properties: 


(2.2.9) 
e Closure: If a and b are in H, then ab is in H. 


e Identity: 1 isin H. 


e Inverses: [fais in H, then a™! isin H. 
These conditions are explained as follows: The first one tells us that the law of composition 


on the group G defines a law of composition on H, called the induced law. The second and 
third conditions say that H is a group with respect to this induced law. Notice that (2.2.9) 
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mentions all parts of the definition of a group except for the associative law. We don’t need 
to mention associativity. It carries over automatically from G to the subset H. 


Notes: (i) In mathematics, it is essential to learn the definition of each term. An intuitive 
feeling will not suffice. For example, the set T of invertible real (upper) triangular 2 x 2 
matrices is a subgroup of the general linear group GL, and there is only one way to verify 
this, namely to go back to the definition. It is true that T is a subset of GL. One must verify 
that the product of invertible triangular matrices is triangular, that the identity is triangular, 
and that the inverse of an invertible triangular matrix is triangular. Of course these points 
are very easy to check. 


(ii) Closure is sometimes mentioned as one of the axioms for a group, to indicate that the 
product ab of elements of G is again an element of G. We include closure as a part of what 
is meant by a law of composition. Then it doesn’t need to be mentioned separately in the 
definition of a group. O 


Examples 2.2.10 


(a) The set of complex numbers of absolute value 1, the set of points on the unit circle in 
the complex plane, is a subgroup of the multiplicative group C” called the circle group. 


(b) The group of real n Xn matrices with determinant 1 is a subgroup of the general linear 
group GL», called the special linear group. It is denoted by SL,,: 


(2.2.11) SL, (CR) is the set of real m X n matrices A with determinant equal to 1. 
The defining properties (2.2.9) are often very easy to verify for a particular subgroup, and 


we may not carry the verification out. 


e Every group G has two obvious subgroups: the group G itself, and the trivial subgroup 
that consists of the identity element alone. A subgroup is a proper subgroup if it is not one 
of those two. 


2.3. SUBGROUPS OF THE ADDITIVE GROUP OF INTEGERS 


We review some elementary number theory here, in terms of subgroups of the additive 
group Zt of integers. To begin, we list the axioms for a subgroup when additive notation is 
used in the group: A subset S of a group G with law of composition written additively is a 
subgroup if it has these properties: 


(2.3.1) 
¢ Closure: If a and b arein S, then a+ bisin S. 


¢ Identity: 0 isin S. 

¢ Inverses: If ais in S then -a is in S. 

Let a be an integer different from 0. We denote the subset of Z that consists of all 
multiples of a by Za: 
(2.3.2) Za = {n €Z|n=katorsome k in Z}. 


This is a subgroup of Z*. Its elements can also be described as the integers divisible by a. 
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Theorem 2.3.3 Let S be a subgroup of the additive group Z*. Either S is the trivial subgroup 
{0}, or else it has the form Za, where a is the smallest positive integer in S. 


Proof. Let S be a subgroup of Zt. Then 0 is in S, and if 0 is the only element of S then S 
is the trivial subgroup. So that case is settled. Otherwise, S contains an integer n different 
from 0, and either n or -n is positive. The third property of a subgroup tells us that -n is in 
S, so in either case, S contains a positive integer. We must show that S is equal to Za, when 
ais the smallest positive integer in S. 

We first show that Za is a subset of S, in other words, that ka is in S for every integer 
k. If k is a positive integer, then kKa = a+a+---+a (k terms). Since a is in S, closure and 
induction show that ka is in S. Since inverses are sia S, -ka is in S. Finally, 0 = 0a is in S. 

Next we show that S is a subset of Za, that is, every element n of S is an integer 
multiple of a. We use division with remainder to write n = ga+r, where g andr are integers 
and where the remainder r is in the range 0 < r <a. Since Za is contained in S, ga is in S, 
and of course n is in S. Since S is a subgroup, r = n — ga isin S too. Now by our choice, a is 
the smallest positive integer in S, while the remainder r is in the range 0 < r < a. The only 
remainder that can be in S is 0. So r = 0 and n is the integer multiple ga of a. O 


There is a striking application of Theorem 2.3.3 to subgroups that contain two integers 
aand b. The set of all integer combinations ra + sb of a and b, 


(2.3.4) S=Za+Zb={n¢€Z|n=ra+sb for some integers r, s} 


is a subgroup of Zt. It is called the subgroup generated by a and b because it is the smallest 
subgroup that contains both a and b. Let’s assume that a and b aren’t both zero, so that S 
is not the trivial subgroup {0}. Theorem 2.3.3 tells us that this subgroup S has the form Zd 
for some positive integer d; it is the set of integers divisible by d. The generator d is called 
the greatest common divisor of a and b, for reasons that are explained in parts (a) and (b) 
of the next proposition. The greatest common divisor of a and b is sometimes denoted by 
gcd(a, b). 


Proposition 2.3.5 Let a and b be integers, not both zero, and let d be their greatest common 
divisor, the positive integer that generates the subgroup S = Za + Zb. So Zd = Za + Zb. 
Then 

(a) d divides a and b. 

(b) If an integer e divides both a and J, it also divides d. 

(c) There are integers r and s such that d= ra-+ sb. 


Proof. Part (c) restates the fact that d is an element of S. Next, a and b are elements of S 
and S = Zd, so d divides a and b. Finally, if an integer e divides both a and 5, then e divides 
the integer combination ra + sb = d. O 


Note: If e divides a and b, then e divides any integer of the form ma + nb. So (c) implies 
(b). But (b) does not imply (c). As we shall see, property (c) is a powerful tool. 


One can compute a greatest common divisor easily by repeated division with remainder: 
For example, if a = 314 and b = 136, then 


314 =2-136+42, 136=3-424+10, 42=4-10+2. 
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Using the first of these equations, one can show that any integer combination of 314 and 136 
can also be written as an integer combination of 136 andthe remainder 42, and vice versa. So 
Z(314) + Z(136) = Z(136) + Z(42), and therefore gcd(314, 136) = gcd(136, 42). Similarly, 
gcd(136, 42) = ged(42, 10) = ged(10, 2) = 2. So the greatest common divisor of 314 and 136 
is 2. This iterative method of finding the greatest common divisor of two integers is called 
the Euclidean Algorithm. 

If integers a and b are given, a second way to find their greatest common divisor is 
to factor each of them into prime integers and then to collect the common prime factors. 
Properties (a) and (b) of Proposition 2.3.5 are easy to verify using this method. But without 
Theorem 2.3.3, property (c), that the integer determined by this method is an integer 
combination of a and b wouldn’t be clear at all. Let’s not discuss this point further here. We 
come back to it in Chapter 12. 


Two nonzero integers a and bare said to be relatively prime if the only positive integer 
that divides both of them is 1. Then their greatest common divisor is 1: Za + Zb = Z. 


Corollary 2.3.6 <A pair a, b of integers is relatively prime if and only if there are integers r 
and s such that ra+ sb = 1. O 


Corollary 2.3.7 Let p be a prime integer. If p divides a product ab of integers, then p 
divides a or p divides b. 


Proof. Suppose that the prime p divides ab but does not divide a. The only positive divisors 
of p are 1 and p. Since p does not divide a, gcd(a, p) = 1. Therefore there are integers r 
and s such that ra + sp = 1. We multiply by b: rab + s pb = b, and we note that p divides 
both rab and spb. So p divides b. Oo 


There is another subgroup of Z* associated to a pair a, b of integers, namely the 
intersection Za M Zb, the set of integers contained both in Za and in Zb. We assume now 
that neither a nor b is zero. Then Za‘ Zb is a subgroup. It is not the trivial subgroup {0} 
because it contains the product ab, which isn’t zero. So Za Q Zb has the form Zm for some 
positive integer m. This integer m is called the least common multiple of a and b, sometimes 
denoted by Icm(a, b), for reasons that are explained in the next proposition. 


Proposition 2.3.8 Let a and b be integers different from zero, and let m be their least 
common multiple - the positive integer that generates the subgroup S = ZaN Zb. So 
Zm = Zan Zb. Then 

(a) m is divisible by both a and b. 

(b) If an integer 7 is divisible by a and by b, then it is divisible by m. 


Proof. Both statements follow from the fact that an integer is divisible by a and by b if and 
only if it is contained in Zm = Zan Zb. Oo 


Corollary 2.3.9 Let d = gcd(a, b) and m = Icm(a, b) be the greatest common divisor and 
least common multiple of a pair a, b of positive integers, respectively. Then ab = dm. 


Proof. Since b/d is an integer, a divides ab/d. Similarly, b divides ab/d. So m divides 
ab/d, and dm divides ab. Next, we write d = ra+ sb. Then dm = ram + sbm. Both terms 
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on the right are divisible by ab, so ab divides dm. Since ab and dm are positive and each 
one divides the other, ab = dm. oO 


2:4. C¥CLIC GROUPS. 


We come now to an important abstract example of a subgroup, the cyclic subgroup generated 
by an arbitrary element x of a group G. We use multiplicative notation. The cyclic subgroup 
H generated by x is the set of all elements.that are powers of x: 


(2.4.1) BS eee oe la Rieke ec 


This is the smallest subgroup of G that contains x, and it is often denoted by <x>. But to 
interpret (2.4.1) correctly, we must remember that the notation x” represents an element 
of the group that is obtained in a particular way. Different powers may represent the same 
element. For example, if G is the multiplicative group R* and x = -1, then all elements in 
the list are equal to 1 or to-1, and A is the set {1, -1}. 

There are two possibilities: Either the powers x” represent distinct elements, or they 
do not. We analyze the case that the powers of x are not distinct. 


Proposition 2.4.2 Let <x >be the cyclic subgroup of.a group G generated by an element x, 
and let S denote the set of integers k such that x* = 1. 


(a) The set Sis a subgroup of the additive group Z*. 

(b), Two powers x” = x*, with r > s, are equal if and only if x”-* = 1, i.e., if and only if r—s 
isin S. 

(c) Suppose that S is not the trivial subgroup. Then S = Zn for some positive integer n. 
The powers 1, x, x”, ..., x”! are the distinct elements of the subgroup <x>, and the 
order of <x is n. 


Proof. (a) If x* =1 and x® = 1, then x*+@ = xkx® = 1. This shows that if k and @ are in S, 
then k + @ is in S. So the first property (2.3.1) for a subgroup is verified. Also, x° = 1, so 0 is 
in S. Finally, if k is in S, i.e., x* = 1, then x-* = (x4)"! = 1 too, so —k isin S. 

(b) This follows from the Cancellation Law 2.2.3. 


(c) Suppose that S+{0}. Theorem 2.3.3 shows that S = Zn, where n is the smallest positive 
integer in S. If x* is an arbitrary power, we divide k by n, writing k = gn +r with r in the 
range 0 < r.<n.Then x?” = 19 = 1, and xk = x9" x" = x", Therefore x* is equal to one of 
the powers 1, x, ..., x"~1_ It follows from (b) that these powers are distinct, because x” is 
the smallest positive power equal to 1. O 


The group <x> = {1, x, ...,.x"7!} described by part (¢) of this proposition is called a 
cyclic group of order n. It is called cyclic because repeated multiplication by x cycles through 
the n elements., 

An, element.x of:a group has order n if n is the smallest positive integer with the 
property x” = 1, which is the same thing as saying that the. cyclic subgroup <x> generated: 
by x has order n. 

With the usual presentation of the symmetric group 53, the element x has order 3, and 
y has.order 2- In any group, the identity element is the only.element of order 1. 
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If x” 41 for all n > 0, one says that x has infinite order. The matrix E i has 
infinite order in G L7(R), while & ‘| has order 6. 

When x has infinite order, the group <x» is said to be infinite cyclic. We won't have 
much to say about that case. 


Proposition 2.4.3 Let x be an element of finite order n in a group, and let k be an integer 
that is written as k = nq +r where q and r are integers and r is in the rangeO <r<n. 


o xk=x’, 
e x* = 1ifand only if r =0. 
* Let d be the greatest common divisor of k and n. The order of x* is equal 


ton/d. O 


One may also speak of the subgroup of a group G generated by a subset U. This is 
the smallest subgroup of G that contains U, and it consists of all elements of G that can be 
expressed as a product of a string of elements of U and of their inverses. A subset U of G 
is said to generate G if every element of G is such a product. For example, we saw in (2.2.7) 
thatthe set U = {x, y} generates the symmetric group S3. The elementary matrices generate 
GL» (1.2.16). In both of these examples, inverses aren’t needed. That isn’t always true. An 
infinite cyclic group <x> is generated by the element x, but negative powers are needed to 
filloutthe group. ~ 

The Klein four group V, the group consisting of the four matrices 


(2.4.4) ie me 


is the simplest group that is not cyclic. Any two of its elements different from the identity 
generate V. The quaternion group H is another example of a small group. It consists of the 
eight matrices 


(2.4.5) H={+1, +i, +j, +k}, 


=[5 $]-t=[o S)-s-[4 o]-*=[F of: 


These matrices can be obtained from the Pauli matrices of physics by multiplying by 7. 
The two elements i and j generate H. Computation leads to the formulas 


(2.4.6) v=jf =k =-1, ij=-ji=xk, jk=-kj=i, ki=-ik=j. 


where 


2.5 HOMOMORPHISMS 


Let G and G’ be groups, written with multiplicative notation. A homomorphism o:G > G' 
isa map from G to G’ such that for all a and bin G, 


(2.5.1) y(ab) = v(a)eg(b). 
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The left side of this equation means 
first multiply a and b in G, then send the product to G' using the map ¢, 
while the right side means 
first send a and b individually to G' using the map @, then multiply their images in G’. 


Intuitively, a homomorphism is a map that is compatible with the laws of composition in the 
two groups, and it provides a way to relate different groups. 


Examples 2.5.2. The following maps are homomorphisms: 


(a) the determinant function det:G L,(R) > R* (1.4.10), 

(b) the sign homomorphism oa: S, — {+1} that sends a permutation to its sign (1.5.11), 
(c) the exponential map exp:R*t — R* defined by x ~ e*, 

(d) the map g:Zt > G defined by y(n) = a”, where a is a given element of G, 

(e) the absolute value map | |: C*% > R*. 


In examples (c) and (d), the law of composition is written additively in the domain and 
multiplicatively in the range. The condition (2.5.1) for a homomorphism must be rewritten 
to take this into account. It becomes _ 


pla +b) = g(ayg(b). 


b 


a+b _ paged 


The formula showing that the exponential map is a homomorphism is _ e 


The following homomorphisms need to be mentioned, though they are less interesting. 
The trivial homomorphism ~:G — G’ between any two groups maps every element of G to 
the identity in G’. If H is a subgroup of G, the inclusion map i: H > G defined by i(x) = x 
for x in H is ahomomorphism. 


Proposition 2.5.3 Let g:G — G’ be a group homomorphism. 


(a) Ifa,,..., a, are elements of G, then g(q@; --- ax) = 9(a1)--- Gag). 

(b) g maps the identity to the identity: p(1g) = lq. 

(c) g maps inverses to inverses: gp(a) = g(ay"!. 

Proof. The first assertion follows by induction from the definition. Next, since 1-1 = 1 and 
since g is a homomorphism, g(1)¢(1) = g(1- 1) = g(1). We cancel g(1) from both sides 
(2.2.3) to obtain y(1) = 1. Finally, g(a” )e(a) = g(a“'a) = eC) = 1. Hence g(a“) is the 
inverse of g(a). O 


A group homomorphism determines two important subgroups: its image and its kernel. 


e The image of ahomomorphism g:G — G’, often denoted by im @, is simply the image of 
gy as a map of sets: 


(2.5.4) img = {x € G’ | x = g(a) forsomea in G}, 


Another notation for the image would be g(G). 
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The image of the map Z* — G that sends n ~» a” is the cyclic subgroup <a> generated 
by a. : 

The image of a homomorphism is a subgroup of the range. We will verify closure and 
omit the other verifications. Let x and y be elements of the image. This means that there 
are elements a and b in G such that x = g(a) and y = (bd). Since g is a homomorphism, 
xy = p(a)g(b) = ¢g(ab). So xy is equal to y(something). It is in the image too. 


¢ The kernel of ahomomorphism is more subtle and also more important. The kernel of g, 
often denoted by ker g, is the set of elements of G that are mapped to the identity in G’: 


(2.5.5) kerg = {a€ G | g(a) = 1}. 


The kernel is a subgroup of G because, if a and b are in the kernel, then g(ab) = g(a)g(b) = 
1-1=1,so0 ab is in the kernel, and so on. 


The kernel of the determinant homomorphism GL, (R) —> R% is the special linear 
group SL,,(R) (2.2.11). The kernel of the sign homomorphism S, — {+ 1} is called the 
alternating group. It consists of the even permutations, and is denoted by Ap: 


(2.5.6) The alternating group A, is the group of even permutations. 


The kernel is important because it controls the entire homomorphism. It tells us not 
only which elements of G are mapped to the identity in G’, but also which pairs of elements 
have the same image in G’. 


e If His a subgroup of a group G and ais anelement of G, the notation aH will stand for 
the set of all products ah with h in H: 


(2.5.7) aH = {g € Glg =ah for some h in H}. 


This set is called a left coset of H in G, the word “‘left” referring to the fact that the element 
a appears on the left. 


Proposition 2.5.8 Let @: G — G’ be a homomorphism of groups, and let a and b be 
elements of G. Let K be the kernel of g. The following conditions are equivalent: 


* y(a) = g(d), 

e a'bisin K, 

e b isinthe coset aK, 

e The cosets bK and aK are equal. 


Proof, Suppose that g(a) = y(b). Then g(a'b) = g(a!) g(b) = g(a) 'e(b) = 1. 
Therefore a~'b is in the kernel K. To prove the converse, we turn this argument around. 
If ab is in K, then 1 = g(a" !b) = g(a) 'g(b), so g(a) = v(b). This shows that the first 
two bullets are equivalent. Their equivalence with the other bullets follows. O 


Corollary 2.5.9 A homomorphism g:G -> G’ is injective if and only if its kernel K is the 
trivial subgroup {1} of G. 
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Proof. If K = {1}, Proposition 2.5.8 shows that g(a) = g(b) only when a'b=1,ie,a=b. 
Conversely, if g is injective, then the identity is the only element of G such that g(a) = 1, 
so K = {1}. O 


The kernel of a homomorphism has another important property that is explained in 
the next proposition. If a and g are elements of a group G, the element gag”? is called the 
conjugate of a by g. 


Definition 2.5.10 A subgroup N of a group G is a normal subgroup if for every a in N and 
every gin G, the conjugate gag”! is in N. 


Proposition 2.5.11 The kernel of a homomorphism is a normal subgroup. 


Proof. If a is in the kernel of a homomorphism gy: G —> G’ and if g is any element of G, 


then g(gag”!) = g(g)y(a)y(g"!) = v(g)1y(g) | = 1. Therefore gag”! isin the kernel 
too. O 


Thus the special linear group SL,,(R) is a normal subgroup of the general linear group 
GL, (R), and the alternating group A, is a normal subgroup of the symmetric group Sp. 
Every subgroup of an abelian group is normal, because if G is abelian, then gag ! = a for 
all a and all g in the group. But subgroups of nonabelian groups needn’t be normal. For 
example, in the symmetric group $3, with its usual presentation (2.2.7), the cyclic subgroup 
< y> of order two is not normal, because y is in G, but xyx7! = x’y isn’t in<y>. 


¢ The center of a group G, which is often denoted by Z, is the set of elements that commute 
with every element of G: 


(2.5.12) Z={z¢G| zx =xz forall x € G}. 


It is always a normal subgroup of G. The center of the special linear group SL2(R) consists 
of the two matrices J, -J. The center of the symmetric group Sy, is trivial if n > 3. 


Example 2.5.13 A homomorphism ¢:S4 > $3 between symmetric groups. 
There are three ways to partition the set of four indices {1, 2, 3, 4} into pairs of subsets 
of order two, namely 


(2.5.14) TI, : {1,2} U (3, 4}, Th: {1,3} U {2,4}, IIs: {1, 4} U (2, 3}. 


An element of the symmetric group S4 permutes the four indices, and by doing so it 
also permutes these three partitions. This defines the map g from S4 to the group of 
permutations of the set {IT,, Iz, I13}, which is the symmetric group 53. For example, the 
4-cycle p = (1234) acts on subsets of order two as follows: 


{1, 2} ~ {2,3} {1,3}~» {2,4} {1,4} ~ {1, 2} 
{2, 3} ~ (3, 4} {2, 4}~» {1,3} (3, 4} » (1, 4}. 


Looking at this action, one sees that p acts on the set {I1,, I, I 3} of partitions as the 
transposition (IT; I13) that fixes Ilz and interchanges IT, and 13. 
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If p and q are elements of S4, the product pq is the composed permutation p o q, 
and the action of pq on the set {II,, Iz, 113} is the composition of the actions of g and p. 
Therefore g( pq) = ~(p)9(q), and ¢ is a homomorphism. 

The map is surjective, so its image is the whole group $3. Its kernel can be computed. 
It is the subgroup of S4 consisting of the identity and the three products of disjoint trans- 
positions: 


(2.5.15) K = {1, (12)(34), (13)(24), (14)(23)}. 0 


2.6 ISOMORPHISMS 


An isomorphism y:G — G’ froma group G toa group G’ is a bijective group homomor- 
phism — a bijective map such that p(ab) = ¢g(a)y(b) for all a and b in G. 


Examples 2.6.1 


« The exponential map e* is an isomorphism, when it is viewed as a map from the 
additive group R* to its image, the multiplicative group of positive real numbers. 


e If a is an element of infinite order in a group G, the map sending n~~ a" is an 
isomorphism from the additive group Z* to the infinite cyclic subgroup <a> of G. 

e The set P of n Xn permutation matrices is a subgroup of GL, and the map S, > P 
that sends a permutation to its associated matrix (1.5.7) is an isomorphism. O 


Corollary 2.5.9 gives us a way to verify that a homomorphism gy: G -> G’ is an 
isomorphism. To do so, we check that ker g = {1}, which implies that ¢ is injective, and also 
that im y = G’, that is, y is surjective. 


Lemma 2.6.2 If g:G — G’ is an isomorphism, the inverse map gy !:G' > Gis also an 
isomorphism. 


Proof. The inverse of a bijective map is bijective. We must show that for all x and yin G’, 
gy '(x)o7'(y) = @ (xy). We seta = 9! (x), b = w 1 (y), and c = g (xy). What has to 
be shown is that ab = c, and since @ is bijective, it suffices to show that g(ab) = g(c). Since 
g is a homomorphism, 


g(ab) = y(a)gy(b) = xy = oc). 0 


This lemma shows that when y:G — G’ isanisomorphism, we can make a computation 
in either group. then use g or g! to carry it over to the other. So, for computation with the 
group law, the two groups have identical properties. To picture this conclusion intuitively, 
suppose that the elements of one of the groups are put into unlabeled boxes, and that 
we have an oracle that tells us, when presented with two boxes, which box contains their 
product. We will have no way to decide whether the elements in the boxes are from G or 
from G’. 

Two groups G and G’ aresaid to be isomorphic if there exists an isomorphism ¢ from 
G to G’. We sometimes indicate that two groups are isomorphic by the symbol ~ 


(2.6.3) GG’ means that G is isomorphic to G’. 
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Since isomorphic groups have identical properties, it is often convenient to identify them with 
each other when speaking informally. For instance, we often blur the distinction between 
the symmetric group S, and the isomorphic group P of permutation matrices. 


e The groups isomorphic to a givengroup G form what is called the isomorphism class of G. 


Any two groupsinanisomorphism class are isomorphic. When one speaks of classi fying 
groups, what is meant is to describe these isomorphism classes. This is too hard to do for all 
groups, but we will see that every group of prime order p is cyclic. So all groups of order 
p are isomorphic. There are two isomorphism classes of groups of order 4 (2.11.5) and five 
isomorphism classes of groups of order 12 (7.8.1). 


An interesting and sometimes confusing point about isomorphisms is that there exist 
isomorphisms g: G — G from a group G to itself. Such an isomorphism is called an 
automorphism. The identity map is an automorphism, of course, but there are nearly always 
others. The most important type of automorphism is conjugation: Let g be a fixed element 
of a group G. Conjugation by g is the map ¢ from G to itself defined by 


(2.6.4) g(x) = gxg. 
This is an automorphism because, first of all, it is a homomorphism: 
(xy) = gxyg™ = exe sys = g(x)g(y), 


and second, it is bijective because it has an inverse function — conjugation by g"!. 

If the group is abelian, conjugation by any element g is the identity map: gxg™! = x. 
But any noncommutative group has nontrivial conjugations, and so it has automorphisms 
different from the identity. For instance, in the symmetric group $3, presented as usual, 
conjugation by y interchanges x and x?. 

As was said before, the element gxg7! is the conjugate of x by g, and two elements 
x and x’ of a group G are conjugate if x’ = gxg™! for some g in G. The conjugate gxg™! 
behaves in much the same way as the element x itself; for example, it has the same order in 
the group. This follows from the fact that it is the image of x by an automorphism. (See the 
discussion following Lemma 2.6.2.) 


Note: One may sometimes wish to determine whether or not two elements x and y of a 
group G are conjugate, i.e., whether or not there is an element g in G such that y = gxg™l. 
It is almost always simpler to rewrite the equation to be solved for g as yg = gx. O 


e The commutator aba~'b™ is another element associated to a pair a, b of elements of a 
group. 


The next lemma follows by moving things from one side of an equation to the other. 
Lemma 2.6.5 Two elements a and b of a group commute, ab = ba, if and only if aba”! = b, 
and this is true if and only if aba!b"! = 1. 0 
2.7, EQUIVALENCE RELATIONS AND PARTITIONS 


A fundamental mathematical construction starts with aset Sand forms anewsetby equating 
certain elements of S. For instance, we may divide the set of integers into two classes, the 
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even integers and the odd integers. The new set we obtain consists of two elements that 
could be called Even and Odd. Or, it is common to view congruent triangles in the plane 
as equivalent geometric objects. This very general procedure arises in several ways that we 
discuss here. 


¢ A partition TI of a set S is a subdivision of S into nonoverlapping, nonempty subsets: 
(2.7.1) S = union of disjoint nonempty subsets. 


The two sets Even and Odd partition the set of integers. With the usual notation, 
the sets 


(2.7.2) {1}, {y, xy, xy}, (x, x7} 


form a partition of the symmetric group $3. 


e An equivalence relation on a set S is a relation that holds between certain pairs of elements. 
of S. We may write it as a~b and speak of it as equivalence of a and b. An equivalence 
relation is required to be: 


(2.7.3) 
¢ transitive: lf a~b and b~c, then a~c. 
« symmetric: If a~ b, then b~a. 
e reflexive: For all a, a~a. 


Congruence of triangles is an example of an equivalence relation on the set of triangles 
in the plane. If A, B, and C are triangles, and if A is congruent to .B and B is congruent to 
C, then A is congruent to C, etc. 

Conjugacy is an equivalence relation on a group. Two group elements are conjugate, 
a~ b, if b = gag™ for some group element g. We check transitivity: Suppose that a ~ b 
and b~ c. This means that b = giagy! and c = g2bg5! for some group elements g; and g. 
Then c = g9(g1ag;1)g5! = (8281)a(g281) ,soa~c. 


The concepts of a partition of S and an equivalence relation on S are logically 
equivalent, though in practice one may be presented with just one of the two. 


Proposition 2.7.4 An equivalence relation on a set S determines a partition of S, and 
conversely. 


Proof. Given a partition of S, the corresponding equivalence relation is defined by the rule 
that a~ b if a and b lie in the same subset of the partition. The axioms for an equivalence 
relation are obviously satisfied. Conversely, given an equivalence relation, one defines a 
partition this way: The subset that contains a is the set of all elements b such that a ~ b. This 
subset is called the equivalence class of a. We’ll denote it by Cg here: 


(2.7.5) Ca = {be S|a~b}. 


The next lemma completes the proof of the proposition. O 
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Lemma 2.7.6 Given an equivalence relation on a set S, the subsets of S that are equivalence 
classes partition S. 


Proof. This is an important point, so we will check it carefully. We must remember that the 
notation Cg stands for a subset defined in a certain way. The partition consists of the subsets, 
and several notations may describe the same subset. 

The reflexive axiom tells us that @ is in its equivalence class. Therefore the class Cg is 
nonempty, and since a can be any element, the union of the equivalence classes is the whole 
set S. The remaining property of a partition that must be verified is that equivalence classes 
are disjoint. To show this, we show: 


(2.7.7) If Cg and C; have an element in common, then Cg = Cy. 


Since we can interchange the roles of a and J, it will suffice to show that if Cg and Cp have 
an element, say d, in common, then C, C Coq, i.e., any element x of Cy is also in Cg. If x is 
in Cp, then b ~ x. Since d is in both sets, a~d and b~ d, and the symmetry property tells 
us that d~ b. So we have a~ d,d~b, and b~ x. Two applications of transitivity show that 
a~x,and therefore that x is in Cg. O 


For example, the relation on a group defined by a ~ b if a and b are elements of the 
same order is an equivalence relation. The corresponding partition is exhibited in (2.7.2) for 
the symmetric group S3. 

If a partition of a set S is given, we may construct a new set S whose elements are 
the subsets. We imagine putting the subsets into separate piles, and we regard the piles as 
the elements of our new set S. It seems advisable to have a notation to distinguish a subset 
from the element of the set S (the pile) that it represents. If U is a subset, we will denote by 
[U] the corresponding element of S. Thus if S is the set of integers and if Even and Odd 
denote the subsets of even and odd integers, respectively, then S contains the two elements 
[Even] and [Odd]. 

We will use this notation more generally. When we want to regard a subset U of S as 
an element of a set of subsets of S, we denote it by [U]. 

When an equivalence relation on S is given, the equivalence classes form a partition, 
and we obtain a new set S whose elements are the equivalence classes [C,]. We can think of 
the elements of this new set in another way, as the set obtained by changing what we mean 
by equality among elements. If a and b are in S, we interpret a~ b to mean that a and b 
become equal in S, because Cg = Cy. With this way of looking at it, the difference between 
the two sets S and S is that in S more elements have been declared “equal,” i.e., equivalent. 
It seems to me that we often treat congruent triangles this way in school. 


For any equivalence relation, there is a natural surjective map 
(2.7.8) u:S>S 


that maps an element a of S to its equivalence class: 7(a) = [Ca]. When we want to regard 
S as the set obtained from S by changing the notion of equality, it will be convenient to 
denote the element [Ca] of S by the symbol a. Then the map 2 becomes 


w(ajy=a 
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We can work in S with the symbols used for elements of S, but with bars over them to 
remind us of the new rule: 


(2.7.9) If a and bare in S, then @ = b meansa~b. 


A disadvantage of this bar notation is that many symbols represent the same element 
of S. Sometimes this disadvantage can be overcome by choosing a particular element, a 
representative element, in each equivalence class. For example, the even and the odd integers 
are often represented by 0 and 1: 


(2.7.10) {[Even], [Oda]} = {0, 1}. 


Though the pile picture may be easier to grasp at first, the second way of viewing S is often 
better because the bar notation is easier to manipulate algebraically. 


The Equivalence Relation Defined by a Map 


Any map ofsets f: S > T gives us an equivalence relation onits domain S. It is defined by 


the rulea~ bif f(a) = f(b). 


¢ The inverse image of an element ¢t of 7 is the subset of S consisting of all elements s such 
that f(s) = ¢. It is denoted symbolically as 


(2.7.11) f OM ={seS| f(s) =4}. 


This is symbolic notation. Please remember that unless f is bijective, f~! will not be a map. 
The inverse images are also called the fibres of the map /, and the fibres that are not empty 
are the equivalence classes for the relation defined above. 

Here the set S of equivalence classes has another incarnation, as the image of the map. 
The elements of the image correspond bijectively to the nonempty fibres, which are the 
equivalence classes. 


(2.7.12) Some Fibres of the Absolute Value Map C* > R”. 


Example 2.7.13 If G is a finite group, we can define a map f:G — N to the set {1, 2, 3, ...} 
of natural numbers, letting f(a) be the order of the element a of G. The fibres of this map 
are the sets of elements with the same order (see (2.7.2), for example). O 
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We go back to a group homomorphism gy: G - G’. The equivalence relation on G 
defined by ¢ is usually denoted by =, rather than by ~, and is referred to as congruence: 


(2.7.14) a=b if g(a) = g(b). 


We have seen that elements a and b of G are congruent, i.e., p(a) = y(b), if and only if b is 
in the coset aK of the kernel K (2.5.8). 


Proposition 2.7.15 Let K be the kernel of ahomomorphism y:G —> G’. The fibre of ¢ that 
contains an element a of G is the coset aK of K. These cosets partition the group G, and 
they correspond to elements of the image of ¢. 0 


(2.7.16) A Schematic Diagram of a Group Homomorphism. 


2.8 COSETS 


As before, if H is a subgroup of G and if a is an element of G, the subset 
(2.8.1) aH = {ah | hin H}. 


is called a left coset. The subgroup 4 is a particular left coset because H = 1H. 
The cosets of H in G are equivalence classes for the congruence relation 


(2.8.2) a=b if b=ahforsomeh in H. 


This is very simple, but let’s verify that congruence is an equivalence relation. 


Transitivity: Suppose that a=b and b=c. This means that b = ah and c = bh’ for some 
elements h and h’ of H. Therefore c = ahh’. Since H is a subgroup, hh’ is in H, and thus 
a=c. 

Symmetry. Suppose a= b, so that b = ah. Then a = bh"! and h’! isin H, so b=a. 
Reflexivity: a = al and 1 is in H, soa=a. 


Notice that we have made use of all the defining properties of a subgroup here: closure, 
inverses, and identity. 
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Corollary 2.8.3 The left cosets of a subgroup H of a group G partition the group. 


Proof. The left cosets are the equivalence classes for the congruence relation (2.8.2). O 


Keep in mind that the notation aH defines a certain subset of G. As with any 
equivalence relation, several notations may define the same subset. For example, in the 
symmetric group 53, with the usual presentation (2.2.6), the element y generates a cyclic 
subgroup H =< y> of order 2. There are three left cosets of H in G: 


(2.8.4) H=({l,y}=yH, xH = {x,xy}=xyH, »°°H =({x*, x*y}=x’yH. 


These sets do partition the group. 
Recapitulating, let H be a subgroup of a group G and let a and b be elements of G. 
The following are equivalent: 


(2.8.5) 
e b=ah forsomeh in H, or, a-!b is anelement of H, 


« bis an element of the left coset aH, 

e the left cosets aH and bH are equal. 

The number of left cosets of a subgroup is called the index of H in G. The index is 
denoted by 
(2.8.6) [G:H]. 


Thus the index of the subgroup < y> of $3 is 3. When G is infinite, the index may be infinite 
too. : 


Lemma 2.8.7 All left cosets aH of a subgroup H of a group G have the same order. 


Proof. Multiplication by a defines a map H — aH that sends h ~» ah. This map is bijective 
because its inverse is multiplication by a™!. O 


Since the cosets all have the same order, and since they partition the group, we obtain 
the important Counting Formula 


(2.8.8) |G| = |H|[G: A] 

(order of G) = (order of. H) (number of cosets), 
where, as always, |G| denotes the order of the group. The equality has the obvious meaning 
if some terms are infinite. For the subgroup < y> of S3, the formula reads 6 = 2 - 3. 


It follows from the counting formula that the terms on the right side of (2.8.8) divide 
the left side. One of these facts is called Lagrange’s Theorem: 


Theorem 2.8.9 Lagrange’s Theorem. Let H be asubgroup of a finite group G. The order of 
H divides the order of G. O 


Corollary 2.8.10 The order of an element of a finite group divides the order of the group. 
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Proof. The order of an element a of a group G is equal to the order of the cyclic subgroup 
<a> generated by a (Proposition 2.4.2). 


Corollary 2.8.11 Suppose that a group G has prime order p. Let a be any element of G 
other than the identity. Then G is the cyclic group < a> generated by a. 


Proof. The order of an element a# 1 is greater than 1 and it divides the order of G, which 
is the prime integer p. So the order of a is equal to p. This is also the order of the cyclic 
subgroup <a > generated by a. Since G has order p, <a> = G. Oo 


This corollary classifies groups of prime order p. They form one isomorphism class, the class 
of the cyclic groups of order p. 


The counting formula can also be applied when a homomorphism gy: G — G’ is given. 
As we have seen (2.7.15), the left cosets of the kernel ker g are the nonempty fibres of the 
map @. They are in bijective correspondence with the elements of the image. 


(2.8.12) [G:ker g] = |img|. 


Corollary 2.8.13 Let g:G — G’ be a homomorphism of finite groups. Then 


¢ |G| = |ker g| - jim g], 
e |kerg| divides |G|, and 
e jim g| divides both |G| and |G’ |. 


Proof. The first formula is obtained by combining (2.8.8) and (2.8.12), and it implies that 
|ker y| and |im g| divide |G|. Since the image is a subgroup of G’, Lagrange’s theorem tells 
us that its order divides |G’| too. oO 


For example, the sign homomorphism o: S, — {+1} (2.5.2)(b) is surjective, so its 
image has order 2. Its kernel, the alternating group A», has order 5ni. Half of the elements 
of S, are even permutations, and half are odd permutations. 


The Counting Formula 2.8.8 has an analogue when a chain of subgroups is given. 


Proposition 2.8.14 Multiplicative Property of the Index. Let G > H > K be subgroups of 
a group G. Then [G: K] =[G: A][H: K]. 


Proof. We will assume that the two indices on the right are finite, say [G: H] = m and 
{H: K] =n. The reasoning when one or the other is infinite is similar. We list the m cosets 
of H in G, choosing representative elements for each coset, say aS 21H,..., 8m H. Then 
gi U---U mH isa partition of G. Similarly, we choose representative elements for each 
coset of K in H, obtaining a partition H = h,K U---Uh,K. Since multiplication by g; is 
an invertible operation, g;H = gjh,iK U---U gjhy,K will be a partition of the coset 9;H. 
Putting these partitions together, G is partitioned into the mn cosets gjh; K. O 


Right Cosets 


Let us go back to the definition of cosets. We made the decision to work with left cosets aH. 
One can also define right cosets of a subgroup H and repeat the above discussion for them. 
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The right cosets of a subgroup H of a group G are the sets 

(2.8.15) Ha = {ha|he H}. 

They are equivalence classes for the relation (right congruence) 
a=b if b=ha, forsome hin dH. 


Right cosets also partition the group G, but they aren’t always the same as left cosets. For 
instance, the right cosets of the subgroup < y> of S3 are 


(2.8.16) H=(1,y}= Hy, Hx=(x,xy}=Hx’y, Hx? = (0?, xy} = Axy. 


This isn’t the same as the partition (2.8.4) into left cosets. However, if a subgroup is normal, 
its right and left cosets are equal. 


Proposition 2.8.17 Let H be a subgroup of a group G. The following conditions are 
equivalent: 


(i) H is anormal subgroup: For all A in H and all g in G, ghg™! isin H. 
(ii) For all gin G, gHg! = H. 
(iii) For all g in G, the left coset gH is equal to the right coset Hg. 
(iv) Every left coset of H in G is a right coset. 


Proof. The notation gHg™! stands for the set of all elements ghg™', with h in H. 


Suppose that H is normal. So (i) holds, and it implies that gHg™! C H forall gin G. 
Substituting g~! for g shows that g~! Hg C Has well. We multiply this inclusion on the left 
by g and on the right by g™! to conclude that H C gHg™!. Therefore gHg™! = H. This 
shows that (i) implies (ii). It is clear that (ii) implies (i). Next, if gHg! = H, we multiply 
this equation on the right by g to conclude that gH = Hg. This shows that (ii) implies (iii). 
One sees similarly that (iii) implies (ii). Since (iii) implies (iv) is obvious, it remains only to 
check that (iv) implies (iii). 

We ask: Under what circumstances can a left coset be equal to a right coset? We recall 
that the right cosets partition the group G, and we note that the left coset gH and the right 
coset Hg have an element in common, namely g = g-1=1- g. Soif the left coset gH is 
equal to any right coset, that coset must be Hg. O 


Proposition 2.8.18 

(a) If H is a subgroup of a group G and g is an element of G, the set gHg"! is also a 
subgroup. 

(b) Ifagroup G has just one subgroup H of order r, then that subgroup is normal. 


Proof. (a) Conjugation by g is an automorphism of G (see (2.6.4)), and gHg7! is the image 
of H. (b) See (2.8.17): g Hg"! is a subgroup of order r. oO 


Note: If H is a subgroup of a finite group G, the counting formulas using right cosets or left 
cosets are the same, so the number of left cosets is equal to the number of right cosets. This 
is also true when G is infinite, though the proof can’t be made by counting (see Exercise 
M.8). O 
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2.9 MODULAR ARITHMETIC 


This section contains a brief discussion of one of the most important concepts in number 
theory, congruence of integers. If you have not run across this concept before, you will want 
to read more about it. See, for instance, [Stark]. We work with a fixed positive integer n 
throughout the section. 


¢ Two integers a and b are said to be congruent modulo n 
(2.9.1) a=bmodulon, 


ifn divides b — a, or if b= a-+nk for some integer k. For instance, 2=17 modulo 5. 

It is easy to check that congruence is an equivalence relation, so we may consider 
the equivalence classes, called congruence classes, that it defines. We use bar notation, and 
denote the congruence class of an integer a modulo n by the symbol a. This congruence 
class is the set of integers 


(2.9.2) G@={...,a—n,a,a+n,a+t+2n,...}. 


If a and b are integers, the equation a = b means that a=b modulo n, or that n divides 
b — a. The congruence class 0 is the subgroup 


0=Zn ={...,-n,0,n,2n,...} = {kn |k eZ} 


of the additive group Z*. The other congruence classes are the cosets of this subgroup. 
Please note that Zn is not a right coset — it is a subgroup of Zt. The notation for a coset of 
a subgroup H analogous to aH, but using additive notation for the law of composition, is 
a+H={a+h|he H}. To simplify notation, we denote the subgroup Zn by H. Then 
the cosets of H, the congruence classes, are the sets 


(2.9.3) a+H={a+kn|keZ}. 

The n integers 0,1,..., — 1 are representative elements for the m congruence classes. 
Proposition 2.9.4 There are n congruence classes modulo n, namely 0,1,...,2 —1. The 
index [Z: Zn] of the subgroup Zn in Zis n. 0 


Let @ and b be congruence classes represented by integers a and b. Their sum is defined 
to be the congruence class of a + b, and their product is the class of ab. In other words, by 
definition, 


(2.9.5) a@+b=a+b and ab=ab. 


This definition needs some justification, because the same congruence class can be repre- 
sented by many different integers. Any integer a’ congruent to a modulo n represents the 
same class as a does. So it had better be true that if a’ =a and b’=b, thena’+b’=a+b 
and a’b’= ab. Fortunately, this is so. 


Lemma 2.9.6 If a’=a and b’=b modulo n, then a+ b'=a+b and a’'b'=ab 
modulo n. 
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Proof. Assume that a’=a and b’=b, so that a’ = a+rn and b’ = b+sn for some 
integers r and s. Then a’ +b’ =a+b+ (r+s)n. This shows that a’ + b'=a + b. Similarly, 
a'b' = (a+rn)(b+ sn) = ab+ (as +rb+rns)n, so a’b’=ab. O 


The associative, commutative, and distributive laws hold for addition and multiplication 
of congruence classes because they hold for addition and multiplication of integers. For 
example, the distributive law is verified as follows: 


a(b +c) =a(b +c) =a(b+c) (definition of + and X for congruence classes) 
=ab+ac , (distributive law in the integers) 


=ab+aé=ab+at (definition of + and x for congruence classes). 


The verifications of other laws are similar, and we omit them. 


The set of congruence classes modulo n may be denoted by any one of the symbols 
Z/tn, Z/nZ, or Z/(n). Addition, subtraction, and multiplication in Z/Zn can be made 
explicit by working with integers and taking remainders after division by n. That is what the 
formulas (2.9.5) mean. They tell us that the map 


(2.9.7) Z—> Z/in 


thatsendsan integer a to its congruence class@ is compatible with addition and multiplication. 
Therefore computations can be made in the integers and then carried over to Z/Zn at the 
end. However, computations are simpler if the numbers are kept small. This can be done by 
computing the remainder after some part of a computation has been made. 

Thus if n = 29, so that Z/Zn = {0, 1,2, ..., 28}, then (35)(17 + 7) can be computed 
as 35-24 = 6- (-5) =-30 =-1. 

In the long run, the bars over the numbers become a nuisance. They are often left off. 
When omitting bars, one just has to remember this rule: 


(2.9.8) To say a = bin Z/Zn means that a=b modulo n. 


Congruences modulo a prime integer have special properties, which we discuss at the 
beginning of the next chapter. 
2.10 THE CORRESPONDENCE THEOREM 
Let g: G > G bea group homomorphism, and let H be a subgroup of G. We may restrict 
to H, obtaining a homomorphism 


(2.10.1) gla:H > G. 


This means that we take the same map ¢ but restrict its domain: So by definition, if / is in 
H, then [¢|#](2) = g(h). (We’ve added brackets around the symbol ¢|z; for clarity.) The 
restriction is a homomorphism because ¢ is one, and the kernel of ¢| 7 is the intersection of 
the kernel of g with H: 


(2.10.2) ker (|) = (kerg) 1 A. 
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This is clear from the definition of the kernel. The image of g| # is the same as the image 
g(A) of H under the map ¢. 

The Counting Formula may help to describe the restriction. According to Corollary 
(2.8.13), the order of the image divides both || and |G|. If |H| and |G} have no common 
factor, g(H) = {1}, so A is contained in the kernel. 


Example 2.10.3 The image of the sign homomorphism o: S, > {+1} has order 2. If a 
subgroup H of the symmetric group S, has odd order, it will be contained in the kernel 
of c, the alternating group A, of even permutations. This will be so when 7 is the cyclic 
subgroup generated by a permutation q that is an element of odd order in the group. Every 
permutation whose order in the group is odd, such as an n-cycle with n odd, is an even 
permutation. A permutation that has even order in the group may be odd or even. O 


Proposition 2.10.4 Let @:G — G be a homomorphism with kernel K and let 1 be a 
subgroup of G. Denote the inverse image y !(H) by H. Then H is a subgroup of G that 
contains K. If is a normal subgroup of G, then H is a normal subgroup of G. If @ is 
surjective and if His anormal subgroup of G, then # is a normal subgroup of G. 


For example, let g denote the determinant homomorphism GL,,(R) > R*. The set of 
positive real numbers is a subgroup of R*; it is normal because R” is abelian. Its inverse 
image, the set of invertible matrices with positive determinant, is a normal subgroup of 
GL,,(R). 


Proof. This proof is simple, but we must keep in mind that g is not a map. By definition, 
y '(H) = His the set of elements x of G such that g(x) is in H. First, if x is in the kernel 
K, then g(x) = 1. Since 1 is in H, x isin H. Thus H contains K. We verify the conditions 
for a subgroup. 

Closure: Suppose that x and y are in H. Then g(x) and ¢(y) are in H. Since 1 is a subgroup, 
g(x) ¢(y) is in H. Since g is a homomorphism, g(x) ¢(y) = g(xy). So g(xy) is in H, and 
xyisin A. 

Identity: 1 isin H because g(1) = 1 isin H. 

Inverses: Let x be an element of H. Then g(x) is in H, and since H is a subgroup, g(x)! 
is also in H. Since g is a homomorphism, g(x)~! = g(x~!), so g(x!) is in H, and xl is 
in H. 

Suppose that H is a normal subgroup. Let x and g be elements of H and G, respec- 
tively. Then y(gxg7!) = v(g)p(x)y(g)! is a conjugate of p(x), and g(x) is in H. Because 
H is normal, g(gxg™!) is in H, and therefore gxg"! is in H. 

Suppose that g is surjective, and that H is a normal subgroup of G. Let a be in 
H, and let b be in G. There are elements x of H and y of G such that g(x) = a 
and ~(y) = b. Since H is normal, yxy"! is in H, and therefore g(yxy!) = bab"! is 
in H. O 


Theorem 2.10.5 Correspondence Theorem. Let g: G — G be a surjective group homo- 
morphism with kernel K. There is a bijective correspondence between subgroups of G and 
subgroups of G that contain K: 


{subgroups of G that contain K} <—> {subgroups of G}. 
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This correspondence is defined as follows: 
a subgroup H of G that contains K ~» its image y(A) inG, 
a subgroup HofG ~~ its inverse image gy '(H) inG. 


If H and H are corresponding subgroups, then H is normal in G if and only if 1 is normal 
in G. 
If H and # are corresponding subgroups, then | H| = |H|{K|. 


Example 2.10.6 We go back to the homomorphism @: S4 — $3 that was defined in Example 
2.5.13, and its kernel K (2.5.15). 

The group $3 has six subgroups, four of them proper. With the usual presentation, 
there is one proper subgroup of order 3, the cyclic group <.x>, and there are three subgroups 
of order 2, including < y>. The Correspondence Theorem tells us that there are four proper 
subgroups of S4 that contain K. Since | K| = 4, there is one subgroup of order 12 and there 
are three of order 8. 

We knowa subgroup of order 12, namely the alternating group A 4. That is the subgroup 
that corresponds to the cyclic group <x> of S3. 

The subgroups of order 8 can be explained in terms of symmetries of a square. With 
vertices of the square labeled as in the figure below, a counterclockwise rotation through 
the angle 27/2 corresponds to the 4-cycle (1234). Reflection about the diagonal through the 
vertex 1 corresponds to the transposition (24). These two permutations generate a subgroup 
of order 8. The other subgroups of order 8 can be obtained by labeling the vertices in 
other ways. 


2 1 


3 4 


There are also some subgroups of $4 that do not contain K. The Correspondence 
Theorem has nothing to say about those subgroups. O 


Proof of the Correspondence Theorem. Let H be a subgroup of G that contains K, and let 
H be a subgroup of G. We must check the following points: 


¢ g(A) is a subgroup of G. 

° gy +(H) is a subgroup of G, and it contains K. 

e Hisa normal subgroup of G if and only if g-!(H) is anormal subgroup of G. 
° (bijectivity of the correspondence) o(y'(H)) = Hand yg !(y(A)) = H. 

© 1g (H)| = JHIIK. 


Since g(H) is the image of the homomorphism ¢|y, it is a subgroup of G. The second and 
third bullets form Proposition 2.10.4. 

Concerning the fourth bullet, the equality p(y" !(H)) = H is true for any surjective 
map of sets p: S > S’ and any subset H of S’. Also, H C gy !(p(A)) is true for any map 
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y of sets and any subset H of S. We omit the verification of these facts. Then the only 
thing remaining to be verified is that H > g!(y(A)). Let x be an element of gy !(g(H)). 
We must show that x is in H. By definition of the inverse image, g(x) is in y(4), say 
g(x) = g(a), with a in H. Then a™'x is in the kernel K (2.5.8), and since H contains K, 
a™!x isin H. Since both a and a~'x are in H, x is in H too. 

We leave the proof of the last bullet as an exercise. Oo 


2.11. PRODUCT GROUPS 


Let G, G’ be two groups. The product set G x G’, the set of pairs of elements (a, a’) with 
ain G and a’ in G’, can be made into a group by component-wise multiplication — that is, 
multiplication of pairs is defined by the rule 


(2.11.1) (a, a’) - (b, b') = (ab, a’b’). 


The pair (1, 1) is the identity, and the inverse of (a, a’) is (a"!, a’ < ). The associative law in 
G XG’ follows from the fact that it holds in G and in G’. 

The group obtained in this way is called the product of G and G’ and is denoted by 
G XG". It is related to the two factors G and G’ ina simple way that we can sum up in terms 
of some homomorphisms 


G.; : G 
ae 


(2.11.2) see > G' 


They are defined by i(x) = (x, 1), i'(x')=(,x'), p(x, x)=x, p'(x, x’) = x’. The 
injective homomorphisms i and i’ may be used to identify G and G’ with their images, the 
subgroups G X 1 and 1 X G’ of G X G’. The maps p and p’ are surjective, the kernel of p is 
1X G’, and the kernel of p’ is G X 1. These are the projections. 

It is obviously desirable to decompose a given group G as a product, that is, to find 
groups H and H’ such that G is isomorphic to the product H x H’. The groups H and H’ 
will be simpler, and the relation between H X H’ and its factors is easily understood. It is 
rare that a group is a product, but it does happen occasionally. 

For example, it is rather surprising that a cyclic group of order 6 can be decomposed: 
A cyclic group C¢ of order 6 is isomorphic to the product C2 X C3 of cyclic groups of orders 
2 and 3. To see this, say that Cp = <y> and C3 = <z>, with y? = 1 and z> = 1, and let x 
denote the element (y, z) of the product group C2 X C3. The smallest positive integer k such 
that x* = (y*, z*) is the identity (1, 1) is k = 6. So x has order 6. Since C2 X C3 also has 
order 6, it is equal to the cyclic group <x >. The powers of x, in order, are 


(1,1), Oz), 0,22), OD, Gz). (, 2). a 


There is an analogous statement for a cyclic group of order rs, whenever the two 
integers r and s have no common factor. 


Proposition 2.11.3 Let r and s be relatively prime integers. A cyclic group of order rs is 
isomorphic to the product of a cyclic group of order r and a cyclic group of order s. O 
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On the other hand, a cyclic group of order 4 is not isomorphic to a product of two cyclic 
groups of order 2. Every element of C2 XC? has order 1 or 2, whereas a cyclic group of order 
4 contains two elements of order 4. 


The next proposition describes product groups. 


Proposition 2.11.4 Let H and K be subgroups of a group G, and let f: Hx K — G be the 
multiplication map, defined by f(h, k) = Ak. Its image is the set HK = {hk|h € H, kk € K}. 


(a) fisinjective if and onlyif HM K = {1}. 

(b) f isa homomorphism from the product group H x K to G if and only if elements of K 
commute with elements of H: hk = kh. 

(c) If H is a normal subgroup of G, then HK is a subgroup of G. 


(d) f is an isomorphism from the product group H x K to G if and only if HN K = {1}, 
HK = G, and also H and K are normal subgroups of G. 


It is important to note that the multiplication map may be bijective though it isn’t a group 
homomorphism. This happens, for instance, when G = $3, and with the usual notation, 
H =<x>and K =<y>. 


Proof. (a) If HK contains an element x #1, then x7! isin H, and f(x7!, x) =1= f(, 1), 
so f is not injective. Suppose that HM K = {1}. Let (hy, k,) and (h2, k2) be elements of 
HX K such that h,k, = h2k2. We multiply both sides of this equation on the left by he and 
on the right by &;!, obtaining k,k,' = hj'Ap. The left side is an element of K and the right 
side is an element of H. Since HM K = {1}, kik5! = hy'h2 = 1. Then ky = kz, hy = hy, 
and (hy, k}) = (ho, k2). 

(b) Let (4;, k1) and (2, kz) be elements of the product group H x K. The product of these 
elements in the product group HX K is (hyh2, kyk2), and f(hyho, kyk2) = hyhzkyk2, while 
Fthy, ki) f(h2, k2) = hy ky hok2. These elements are equal if and only if h2k, = kyh2. 

(c) Suppose that H is a normal subgroup. We note that KH is a union of the left cosets 
kH with k in K, and that HK is a union of the right cosets Hk. Since H is normal, 
kH = Hk, and therefore HK = KH. Closure of HK under multiplication follows, because 
HKHK = HHKK = HK. Also, (hk)! =k 'h7! isin KH = HK. This proves closure of 
AK under inverses. 


(d) Suppose that H and K satisfy the conditions given. Then / is both injective and surjective, 
so it is bijective. According to (b), it is an isomorphism if and only if kA = kh for all h in H 
and k in K. Consider the commutator (hkh7!)k~! = h(kh~!k“!), Since K is normal, the left 
side is in K, and since H is normal, the right side is in H. Since HN K = {1}, hkh-'k"! =1, 
and hk = kh. Conversely, if f is an isomorphism, one may verify the conditions listed in the 
isomorphic group H x K instead of in G. O 


We use this proposition to classify groups of order 4: 
Proposition 2.11.5 There are two isomorphism classes of groups of order 4, the class of the 


cyclic group C4 of order 4 and the class of the Klein Four Group, which is isomorphic to the 
product C2 x C2 of two groups of order 2. 
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Proof. Let G be a group of order 4. The order of any element x of G divides 4, so there are 
two cases to consider: 


Case 1: G contains an element of order 4. Then G is a cyclic group of order 4. 
Case 2: Every element of G except the identity has order 2. 


In this case, x = x7! for every element x of G. Let x and y be two elements of G. Then 
xy has order 2, so xyx7!y7! = (xy)(xy) = 1. This shows that x and y commute (2.6.5), and 
since these are arbitrary elements, G is abelian. So every subgroup is normal. We choose 
distinct elements x and y in G, and we let H and K be the cyclic groups of order 2 that they 
generate. Proposition 2.11.4(d) shows that G is isomorphic to the product group Hx K. O 


2.12 QUOTIENT GROUPS 


In this section we show that a law of composition can be defined on the set of cosets of a 
normal subgroup N of any group G. This law makes the set of cosets of a normal subgroup 
into a group, called a quotient group. 

Addition of congruence classes of integers modulo n is an example of the quotient 
construction. Another familiar example is addition of angles. Every real number represents 
an angle, and two real numbers represent the same angle if they differ by an integer multiple 
of 27. The group N of integer multiples of 27 is a subgroup of the additive group Rt of real 
numbers, and angles correspond naturally to (additive) cosets 9+ N of N in G. The group 
of angles is the quotient group whose elements are the cosets. 

The set of cosets of a normal subgroup N of a group G is often denoted by G/N. 


(2.12.1) G/N is the set of cosets of N in G. 


When we regard a coset C as an element of the set of cosets, the bracket notation [C] 
may be used. If C = aN, we may also use the bar notation to denote the element [C] by a, 
and then we would denote the set of cosets by G: 


G=G/N. 


Theorem 2.12.2 Let N be a normal subgroup of a group G, and let G denote the set of 
cosets of N in G. There is a law of composition on G that makes this set into a group, such 
that the map 2:G -—> G defined by z(a) = Gis a surjective homomorphism whose kernel 
is N. 


¢ The map 7 is often referred to as the canonical map from G to G. The word “canonical” 
indicates that this is the only map that we might reasonably be talking about. 


The next corollary is very simple, but it is important enough to single out: 


Corollary 2.12.3. Let N be a normal subgroup of a group G, and let G denote the set 
of cosets of N in G. Let 7: G — G be the canonical homomorphism. Let aj, ..., ax be 
elements of G such that the product a, --- az isin N. Then @---a, =1. 


Proof. Let p = a,---ax. Then p is in N, so x(p) = PD = 1. Since z is a homomorphism, 
Q:--ap= Pp. O 
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Proof of Theorem 2.12.2. There are several things to be done. We must 


e define a law of composition on G, 

* prove that the law makes G into a group, 

* prove that z is a surjective homomorphism, and 
* prove that the kernel of ris N. 


We use the following notation: If A and B are subsets of a group G, then AB denotes the 
set of products ab: 


(2.12.4) AB = {x € G| x =ab for some a € A and be B}. 


We will call this a product set, though in some other contexts the phrase “‘product set” refers 
to the set A X B of pairs of elements. 


Lemma 2.12.5 Let N be a normal subgroup of a group G, and let aN and bN be cosets of 
N. The product set (aN) (DN) is also a coset. It is equal to the coset abN. 


We note that the set (aN) (DN) consists of all elements of G that can be written in the 
form anbn’, with n and n’ in N. 


Proof. Since N is a subgroup, NN = N. Since N is normal, left and right cosets are equal: 
Nb = DN (2.8.17). The lemma is proved by the following formal manipulation: 


(aN)(bN) = a(Nb)N = a(bN)N = abNN =abN. O 


This lemma allows us to define multiplication on the set G = G/N. Using the bracket 
notation (2.7.8), the definition is this: If C, and C2 are cosets, then [Ci][C2] = [CC], 
Where CC? is the product set. The lemma shows that this product set is another coset. To 
compute the product [C;][Cz2], take any elements a in C; and b in C2. Then C; = aN, 
C2 = bN, and C; Cy is the coset abN that contains ab. So we have the very natural formula 


(2.12.6) [aN][bN] =[abN] or a@b=ab. 
Then by definition of the map z in (2.12.2), 
(2.12.7) 1(a)n(b) =ab = ab = x(ab). 


. The fact that zr is a homomorphism will follow from (2.12.7), once we show that G is a group. 
Since the canonical map 77 is surjective (2.7.8), the next lemma proves this. 


Lemma 2.12.8 Let G be a group, and let Y be a set with a law of composition, both 
laws written with multiplicative notation. Let g@:G — Y be a surjective map with the 
homomorphism property, that g(ab) = g(a)g(b) for all a and b in G. Then Y is a group 
and g is a homomorphism. 
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Proof. The group axioms that are true in G are carried over to Y by the surjective map @. 
Here is the proof of the associative law: Let y;, y2, y3 be elements of Y. Since ¢ is surjective, 
yi = p(x;) for some x; in G. Then 


(1.92) ¥3 = (P(X1) P(X2)) G(X3) =O(41.X2) P(X3) =G((X1.X2) x3) 
~ 9x1 (x2.x3)) = p(x1) p(%2x3)=9(%1) (P12) P(x3)) = Yi (Y2y3)- 


The equality marked with an asterisk is the associative law in G. The other equalities follow 
from the homomorphism property of g. The verifications of the other group axioms are 
similar. 0 


The only thing remaining to be verified is that the kernel of the homomorphism 7 is 
the subgroup N. Well, x(a) = m(1) if and only if @ = 1, or [@N] = [1N], and this is true if 
and only if ais an element of N. O 


(2.12.9) A Schematic Diagram of Coset Multiplication. 


Note: Our assumption that N be a normal subgroup of G is crucial to Lemma 2.12.5. If 7 
is not normal, there will be left cosets C, and C2 of Hin G such that the product set C,C2 
does not lie in a single left coset. Going back once more to the subgroup H = <y> of S83, 
the product set (1)(xH) contains four elements: {1, y}{x, xy} = {x, xy, x’y, x7}. It is not 
a coset. The subgroup Z is not normal. 


The next theorem relates the quotient group construction to a general group homo- 
morphism, and it provides a fundamental method of identifying quotient groups. 


Theorem 2.12.10 First Isomorphism Theorem. Let g: G — G’ be a surjective group 
homomorphism with kernel N. The quotient group G = G/N is isomorphic to the image 
G’. To be precise, let 7: G — G be the canonical map. There is a unique isomorphism 


@:G — G’ such that g = oz. 
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Proof. The elements of G are the cosets of N, and they are also the fibres of the map ~@ 
(2.7.15). The map ¢@ referred to in the theorem is the one that sends a nonempty fibre to 
its image: @(x) = g(x). For any surjective map of sets g:G — G’, one can form the set 
G of fibres, and then one obtains a diagram as above, in which @ is the bijective map that 
sends a fibre to its image. When g is a group homomorphism, @ is an isomorphism because 


Q(ab) = y(ab) = v(a)y(b) = GAGS). Oo 


Corollary 2.12.11 Let g:G — G’ be a group homomorphism with kernel N and image H’. 
The quotient group G = G/N is isomorphic to the image H’. 0 


Two quick examples: The image of the absolute value map CX > R% is the group 
of positive real numbers, and its kernel is the unit circle U. The theorem asserts that the 
quotient group C*/U is isomorphic to the multiplicative group of positive real numbers. 
The determinant is a surjective homomorphism GL,,(R) —> R*, whose kernel is the special 
linear group SL,(R). So the quotient GL, (R)/SL»,(R) is isomorphic to R*. 

There are also theorems called the Second and the Third Isomorphism Theorems, 
though they are less important. 


€s glebt alfo fefe vfel veefehfesene keen von Sevfen, 

welche fich nicht wohl Hersehlen lafen; 

und dahee entftehen die decfthiedene Ihelle Ser Wathematfe, 

Seren eine fegliche mit efnec Sefondern et von Setifen Helehiiftiget iff. 


—Leonhard Euler 


EXERCISES 


Section 1 Laws of Composition 


1.1. Let S bea set. Prove that the law of composition defined by ab = a for all a and bin Sis 
associative. For which sets does this law have an identity? 


1,2, Prove the properties of inverses that are listed near the end of the section. 


1.3. Let N denote the set {1, 2, 3, ..., } of natural numbers, and let s:N — N be the shift map, 
defined by s(m) =n + 1. Prove that s has noright inverse, but that it has infinitely many 
left inverses. 


Section 2 Groups and Subgroups 


2.1. Make a multiplication table for the symmetric group $3. 

2.2. Let S be a set with an associative law of composition and with an identity element. Prove 
that the subset consisting of the invertible elements in S is a group. 

2.3. Let x, y, z, and w be elements of a group G. 


(a) Solve for y, given that xyz !w = 1. 


(b) Suppose that xyz = 1. Doesit follow that yzx = 1? Does it follow that yxz = 1? 
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2.4. 


2.5. 


2.6. 


In which of the following cases is H a subgroup of G? 


(a) G=GL,,(C) and H =GL,(R). 

(b) G = RX and A = (1, -1}. 

(c) G=Z* and H is the set of positive integers. 
(d) G =R* and His the set of positive reals. 


(9G = GLB) and A ieee matrices i aL with a#0. 


In the definition of a subgroup, the identity element in H is required to be the identity 
of G. One might require only that H have an identity element, not that it need be the 
same as the identity in G. Show that if H has an identity at all, then it is the identity in 
G. Show that the analogous statement is true for inverses. 


Let G be a group. Define an opposite group G° with law of composition a x b as follows: 
The underlying set is the same as G, but the law of composition is a * b = ba. Prove that 
G° is a group. 


Section 3 Subgroups of the Additive Group of Integers 


3.1. 


3.2. 


3.3. 


Let a = 123 and b = 321. Compute d = gcd(a,b), and express d as an integer 
combination ra + bs. 


Prove that if a and bare positive integers whose sum is a prime p, their greatest common 
divisor is 1. 


(a) Define the greatest common divisor of a set {a;,..., Gn} of n integers. Prove that it 
exists, and that it is an integer combination of a}, ..., dn. 

(b) Prove that if the greatest common divisor of {a;,...,@n} is d, then the greatest 
common divisor of {a,/d, ..., @n/da} is 1. 


Section 4 Cyclic Groups 


4.1, 


4.2. 


4.3. 
4.4, 
4.5. 


4.6. 


4.7. 


Let a and b be elements of a group G. Assume that a has order 7 and that a%b = ba’. 
Prove that ab = ba. 


An nth root of unity is a complex number z such that z”? = 1. 


(a) Prove that the mth roots of unity form a cyclic subgroup of C* of order n. 
(b) Determine the product of all the nth roots of unity. 


Let a and b be elements of a group G. Prove that ab and ba have the same order. 
Describe all groups G that contain no proper subgroup. 


Prove that every subgroup of a cyclic group is cyclic. Do this by working with exponents, 
and use the description of the subgroups of Zt. 


(a) Let G be a cyclic group of order 6. How many of its elements generate G? Answer 
the same question for cyclic groups of orders 5 and 8. 


(b) Describe the number of elements that generate a cyclic group of arbitrary order n. 


Let x and y be elements ofa group G. Assume that each of the elements x, y, and x yhas 
order 2, Prove that the set H = {1, x, y, xy} is a subgroup of G, and that it has order 4. 


4.8. 


4.9. 
4.10. 


4.11. 
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(a) Prove that the elementary matrices of the first and third types (1.2.4) generate 
GL, (R). 


(b) Prove that the elementary matrices of the first type generate SL, (IR). Do the 2 x2 
case first. 

How many elements of order 2 does the symmetric group S4 contain? 

Show by example that the product of elements of finite order in a group need not have 

finite order. What if the group is abelian? 

(a) Adapt the method of row reduction to prove that the transpositions generate the 
symmetric group Sp. 

(b) Prove that, for n > 3, the three-cycles generate the alternating group An. 


Section 5 Homomorphisms 


5.1. 


5.2. 


§.3. 


5.4, 


5.5. 


5.6. 


Let g:G —> G’ bea surjective homomorphism. Prove that if G is cyclic, then G’ is cyclic, 
and if G is abelian, then G’ is abelian. 


Prove that the intersection K M H of subgroups of a group G is a subgroup of AH, and 
that if K is anormal subgroup of G, then KN H is anormal subgroup of H. 


Let U denote the group of invertible upper triangular 2 x2 matrices A = E “Al and 


let gp: U — R* be the map that sends A ~» a’. Prove that g is a homomorphism, and 
determine its kernel and image. 


Let f:R+ > C% be the map f(x) = e!*. Prove that f isa homomorphism, and determine 
its kernel and image. 


A B 
0 D 
A in GL,(R) and D in GL,_,(R), form a subgroup H of GL,(R), and that the 
map H —> GL,(R) that sends M ~ A is ahomomorphism. What is its kernel? 
Determine the center of GL,,(R). 


Hint: You are asked to determine the invertible matrices A that commute with every 
invertible matrix B. Do not test with a general matrix B. Test with elementary matrices. 


Prove that the n Xn matrices that have the block form M = , with 


Section 6 Isomorphisms 


6.1. 


6.2. 


6.3. 


6.4. 
6.5. 


Let G’ be the group of real matrices of the form k a Is the map Rt > G’ that 
sends x to this matrix an isomorphism? 


Describe all homomorphisms yg: Z*+ — Z*. Determine which are injective, which are 
surjective, and which are isomorphisms. 


Show that the functions f = 1/x, g = (x ~1)/x generate a group of functions, the law of 
composition being composition of functions, that is isomorphic to the symmetric group S3. 


Prove that in a group, the products ab and ba are conjugate elements. 


Decide whether or not the two matrices A = k | and B= E | are conjugate 


elements of the general linear group GL2(R). 
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6.6. Are the matrices i Al E ‘| conjugate elements of the group GL2(R)? Are they 


conjugate elements of SL2(R)? 


6.7. Let H be a subgroup of G, and let g be a fixed element of G. The conjugate subgroup 
gHg' is defined to be the set of all conjugates ghg™!, with A in H. Prove that gHg™ is 
a subgroup of G. 


6.8. Prove that the map A ~ (A')~! is an automorphism of GL,(R). 
6.9. Prove that a group G and its opposite group G° (Exercise 2.6) are isomorphic. 
6.10. Find all automorphisms of 
(a) a cyclic group of order 10, (b) the symmetric group $3. 


6.11. Let a be an element of a group G. Prove that if the set {1, a} is a normal subgroup of G, 
then a is in the center of G. 


Section 7 Equivalence Relations and Partitions 


7.1. Let G be a group. Prove that the relation a~b if b = gag”! for some g in G is an 
equivalence relation on G. 


7.2. An equivalence relation on Sis determined by the subset R of the set S x S consisting of 
those pairs (a, b) such that a~ b. Write the axioms for an equivalence relation in terms 
of the subset R. 


7.3. With the notation of Exercise 7.2, isthe intersection RM R’ of two equivalence relations 
Rand R’ an equivalence relation? Is the union? 


7.4, A relation R on the set of real numbers can be thought of as a subset of the (x, y)-plane. 
With the notation of Exercise 7.2, explain the geometric meaning of the reflexive and 
symmetric properties. 


7.5. With the notation of Exercise 7.2, each of the following subsets R of the (x, y)-plane 
defines a relation on the set R of real numbers. Determine which of the axioms (2.7.3) 
are satisfied: (a) the set {(s,5) | s €R}, (b)the emptyset, (c) the locus {xy+1 = 0}, 
(d) the locus {x2 y — xy* — x + y = 0}. 


7.6. How many different equivalence relations can be defined on a set of five elements? 


Section8 Cosets 


8.1. Let H be the cyclic subgroup of the alternating group A, generated by the permutation 
(123). Exhibit the left and the right cosets of H explicitly. 


8.2. In the additive group R™” of vectors, let W be the set of solutions of a system of homo- 
geneous linear equations AX = 0. Show that the set of solutions of an inhomogeneous 
system AX = B is either empty, or else it is an (additive) coset of W. 


8.3. Does every group whose order is a power of a prime p contain an element of order p? 
8.4. Does a group of order 35 contain an element of order 5? of order 7? 


8.5. A finite group contains an element x of order 10 and also an element y of order 6. What 
can be said about the order of G? 


8.6. Let g:G — G’ be a group homomorphism. Suppose that {|G| = 18, {G’| = 15, and that 
gis not the trivial homomorphism. What is the order of the kernel? 


8.7. 


8.8. 


8.9. 


8.10. 


8.11. 


8.12. 


8.13. 
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A group G of order 22 contains elements x and y, where x1 and y is not a power of x. 
Prove that the subgroup generated by these elements is the whole group G. 


Let G bea group of order 25. Prove that G has at least one subgroup of order 5, and that 
if it contains only one subgroup of order 5, then it is a cyclic group. 


Let G be a finite group. Under what circumstances is the map g:G — G defined by 


v(x) = x? an automorphism of G? 


Prove that every subgroup of index 2 is a normal subgroup, and show by example that a 
subgroup of index 3 need not be normal. 


Let G and 7 be the following subgroups of G L2(R): 


= x y = x 0 
om{[5 t= {5 #]} 
with x and y real and x > 0. An element of G can be represented by a point in the right 


half plane. Make sketches showing the partitions of the half plane into left cosets and into 
right cosets of H. 


Let S be a subset of a group G that contains the identity element 1, and such that the left 
cosets aS, with a in G, partition G. Prove that S is a subgroup of G. 


Let S bea set with a law of composition A partition Ij UTI2 U--- of S is compatible 
with the law of composition if for all i and j, the product set 

i,j = {xy |x € Wj, y € Tj} 
is contained in a single subset IT, of the partition. 


(a) The set Z of integers can be partitioned into the three sets [Pos], [Neg], [{0}]. Discuss 
the extent to which the laws of composition + and x are compatible with this 
partition. 

(b) Describe all partitions of the integers that are compatible with the operation +. 


Section9 Modular Arithmetic 


9.1. 
9.2. 
9.3. 
9.4. 
9.5. 


9.6. 


9.7. 


For which integers n does 2 have a multiplicative inverse in Z/Zn? 

What are the possible values of a* modulo 4? modulo 8? 

Prove that every integer a is congruent to the sum of its decimal digits modulo 9. 
Solve the congruence 2x =5 modulo 9 and modulo 6. 


Determine the integers n for which the pair of congruences 2x — y=1 and 4x + 
3y=2 modulo n has a solution. 

Prove the Chinese Remainder Theorem: Let a, b, u, v be integers, and assume that the 
greatest common divisor of a and Dis 1. Then there is an integer x such that x =u modulo 
aand x=v modulo b. 

Hint: Do the case u = O and v = 1 first. 


Determine the order of each of the matrices A = E ; and B= E a when the 
matrix entries are interpreted modulo 3. 
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Section 10 The Correspondence Theorem 


10.1. 
10.2. 


10.3. 


10.4. 


10.5. 


Describe how to tell from the cycle decomposition whether a permutation is odd or even. 

Let H and K be subgroups of a group G. 

(a) Prove that the intersection xH 9 yK of two cosets of H and K is either empty or 
else is acoset of the subgroup HN K. 

(b) Prove that if H and K have finite indexin G then H 1 K also has finite index in G. 


Let G and G’ be cyclic groups of orders 12 and 6, generated by elements x and y, 
respectively, and let g@: G — G’ be the map defined by g(x') = y’. Exhibit the 
correspondence referred to in the Correspondence Theorem explicitly. 


With the notation of the Correspondence Theorem, let H and H’ be corresponding 
subgroups. Prove that [G: H] =[G’: HH’). 


With reference to the homomorphism $4 — $3 described in Example 2.5.13, determine 
the six subgroups of S4 that contain K. 


Section 11 Product Groups 


11.1. 


11.2. 


11.3. 
11.4. 


11.5. 


11.6. 


11.7. 


11.8. 


11.9. 


Let x be an element of order r of a group G, and let y be an element of G’ of order s. 
What is the order of (x, y) in the product group G X G’? 

What does Proposition 2.11.4 tell us when, with the usual notation for the symmetric 
group 53, K and H are the subgroups < y> and <x >? 

Prove that the product of two infinite cyclic groups is not infinite cyclic. 


In each of the following cases, determine whether or not G is isomorphic to the product 
group HX K. 


(a) G=R*, H ={ +1}, K = {positive real numbers}. 

(b) G = {invertible upper triangular 2 x 2 matrices}, H = {invertible diagonal matrices}, 
K = {upper triangular matrices with diagonal entries 1}. 

(c) G=C*, H = {unit circle}, K = {positive real numbers}. 


Let G, and G2 be groups, and let Z; be the center of G;. Prove that the center of the 
product group G; X G2 is Z; X Z2. 

Let G be a group that contains normal subgroups of orders 3 and 5, respectively. Prove 
that G contains an element of order 15. 


Let H be a subgroup of a group G, let g¢:G — H be a homomorphism whose restriction 
to H is the identity map, and let N be its kernel. What can one say about the product 
map H XN > G? 


Let G, G’, and H be groups. Establish a bijective correspondence between homomor- 
phisms P: H > G XG’ from H to the product group and pairs (9, g’) consisting of a 
homomorphism g: H > G and a homomorphism g’: H > G’. 


Let H and K be subgroups of a group G. Prove that the product set HK is a subgroup 
of G ifand only if HK = KH. 


Section 12 Quotient Groups 


12.1. 


Show that if a subgroup H of a group G is not normal, there are left cosets aH and bH 
whose product is not a coset. 


12.2. 


12.5. 
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In the general linear group GZ3(R), consider the subsets 


1 « x 10 x 
H=|0 1 «|, and K=/]0 1 O], 
001 001 


where * represents an arbitrary real number. Show that 7 is a subgroup of GL3, that K 
is anormal subgroup of H, and identify the quotient group H/ K. Determine the center 
of H. 


. Let P be a partition of a group G with the property that for any pair of elements A, B of 


the partition, the product set AB is contained entirely within another element C of the 
partition. Let N be the element of P that contains 1. Prove that N is a normal subgroup 
of G and that P is the set of its cosets. 


. Let H ={+1, +4) be the subgroup of G = C% of fourth roots of unity. Describe the 


cosets of H in G explicitly. Is G/H isomorphic to G? 


Let G be the group of upper triangular real matrices a | , with a and d different 


from zero. For each of the following subsets, determine whether or not S is a subgroup, 
and whether or not S is a normal subgroup. If S is a normal subgroup, identify the 
quotient group G/S. 


(i) Sis the subset defined by b = 0. 
(ii) Sis the subset defined by d = 1. 
(iii) Sis the subset defined by a = d. 


Miscellaneous Problems 


M.1. 


M.2. 


M.3. 


M.4. 


M.5. 


*M.6. 


Describe the column vectors (a, c)’ that occur as the first column of an integer matrix A 
whose inverse is also an integer matrix. 

(a) Prove that every group of even order contains an element of order 2. 

(b) Prove that every group of order 21 contains an element of order 3. 


Classify groups of order 6 by analyzing the following three cases: 


(i) G contains an element of order 6. 
(ii) G contains an element of order 3 but none of order 6. 
(iii) All elements of G have order 1 or 2. 


A semigroup S is a set with an associative law of composition and with an identity. 
Elements are not required to have inverses, and the Cancellation Law need not hold. A 
semigroup S is said to be generated by an element s if the set {1, s, s*, .. .} of nonnegative 
powers of s is equal to S. Classify semigroups that are generated by one element. 


Let S be a finite semigroup (see Exercise M.4) in which the Cancellation Law 2.2.3 holds. 
Prove that S is a group. 


Let a = (a1,...,a,) and b = (bi,..., bg) be points in k-dimensional space R*. A 
path from a to b is a continuous function on the unit interval [0, 1] with values in R*, a 
function X :[0, 1] > R*, sending t~» X(t) = (1 (0), ..., x¢(2)), such that X(0) = a and 
X(1) = b. If S is a subset of R* and if a and b are in S, define a~b if a and b can be 
joined by a path lying entirely in S. 
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*M.7. 


*M.L8. 


M.9 


M.10. 


*M.11. 


M.12. 


M.13. 


M.14. 


(a) Show that ~ is an equivalence relation on S. Be careful to check that any paths you 
construct stay within the set S. 

(b) Asubset S is path connected if a ~ b for any two points a and bin S. Show that every 
subset S is partitioned into path-connected subsets with the property that two points 
in different subsets cannot be connected by a path in S. 

(c) Which of the following loci in R* are path-connected: {x? + y* = 1}, {xy = 0}, 
{xy = 1}? 


The set of n Xn matrices can be identified with the space R”™", Let G be a subgroup of 
GL, (R). With the notation of Exercise M.6, prove: 


(a) IfA, B, C, Dare in G, and if there are paths in G from A to B and from C to D, then 
there is a path in G from AC to BD. 


(b) The set of matrices that can be joined to the identity / forms a normal subgroup of 
G. (It is called the connected component of G.) 


(a) The group SL, (R) is generated by elementary matrices of the first type (see 
Exercise 4.8). Use this fact to prove that SL, (R) is path-connected. 


(b) Show that GL, (R) is a union of two path-connected subsets, and describe them. 
(double cosets) Let H and K be subgroups of a group G, and let g be an element of G. 


The set HgK = {x € G | x = hgk for some h € H,k € K} is called a double coset. Do 
the double cosets partition G? 


Let H be a subgroup of a group G. Show that the double cosets (see Exercise M.9) 
HgH = {Aigha|hy, a) € H} 


are the left cosets gH if and only if H is normal. 


Most invertible matrices can be written as a product A = LU ofa lower triangular matrix 
Land an upper triangular matrix U, where in addition all diagonal entries of U are 1. 


(a) Explain how to compute L and U when the matrix A is given. 
(b) Prove uniqueness, that there is at most one way to write A as such a product. 


(c) Show that every invertible matrix can be written as a product LPU, where L, U are 
as above and P is a permutation matrix. 


(d) Describe the double cosets LgU (see Exercise M.9). 
(postage stamp problem) Let a and b be positive, relatively prime integers. 


(a) Prove that every sufficiently large positive integer n can be obtained as ra + sb, 
where r and s are positive integers. 

(b) Determine the largest integer that is not of this form. 

(a game) The starting position is the point (1, 1), and a permissible ‘“‘move”’ replaces a 


point (a, b) by one of the points (a + b, b) or (a, a+ b). So the position after the first 
move will be either (2, 1) or (1, 2). Determine the points that can be reached. 


(generating SL2(Z)) Prove that the two matrices 


Fay a ee fs 
e=[o i) e-[1 § 


M.15. 


M.16. 
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generate the group SL2(Z) of all integer matrices with determinant 1. Remember that 
the subgroup they generate consists of all elements that can be expressed as products 
using the four elements E, E’, E~!, ET}, 

Hint. Do not try to write a matrix directly as a product of the generators. Use row 
reduction. 


(the semigroup generated by elementary matrices) Determine the semigroup S (see 
Exercise M.4) of matrices A that can be written as a product, of arbitrary length, each of 
whose terms is one of the two matrices 


lo a}. [i a]: 


Show that every element of S can be expressed as such a product in exactly one way. ° 


l(the homophonic group: a mathematical diversion) By definition, English words have 
the same pronunciation if their phonetic spellings in the dictionary are the same. The 
homophonic group H is generated by the letters of the alphabet, subject to the following 
relations: English words with the same pronunciation represent equal elements of the 
group. Thus be = bee, and since H is a group, we can cancel be to conclude that e = 1. 
Try to determine the group H. 


1] Jearned this problem from a paper by Mestre, Schoof, Washington and Zagier. 


CHAPTER 3 


Vector Spaces 


immer mit den einfachsten Beispielen anfangen. 


—David Hilbert 


3.1 SUBSPACES OF R” 


Our basic models of vector spaces, the topic of this chapter, are subspaces of the space R” of 
n-dimensional real vectors. We discuss them in this section. The definition of a vector space 
is given in Section 3.3. 

Though row vectors take up less space, the definition of matrix multiplication makes 
column vectors more convenient, so we usually work with them. To save space, we sometimes 
use the matrix transpose to write a column vector in the form (a), ..., @,)'. As mentioned 
in Chapter 1, we don’t distinguish a column vector from the point of R” with the same 
coordinates. Column vectors will often be denoted by lowercase letters such as v or w, and 
if v is equal to (a),..., @n)', we call (a;,..., @n)' the coordinate vector of v. 


We consider two operations on vectors: 


ay by ay+b, 
vector addition: sot +] : = : , and 
(3.1.1) an ee Barre 
ay cay 
scalar multiplication: 2 a Foe : 
an Can 


These operations make R” into a vector space. 


A subset W of R” (3.1.1) is a subspace if it has these properties: 


Gis) 


(a) If wand w” are in W, then w + w’ is in W. 
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(b) If w isin W and c is in R, then cw is in W. 
(c) The zero vector is in W. 


There is another way to state the conditions for a subspace: 


(3.1.3) Wéis not empty, and if w),..., wn are elements of W and cj, ..., Cy, are scalars, 
the linear combination cywy+-+-++CpWrp is also in W. 


Systems of homogeneous linear equations provide examples. Given an m Xn matrix 
A with coefficients in R, the set of vectors in R” whose coordinate vectors solve the 
homogeneous equation AX = 0 is a subspace, called the nulispace of A. Though this is very 
simple, we'll check the conditions for a subspace: 


e AX =Oand AY = Oimply A(X + Y) =0: If X¥ and Y are solutions, so is X + Y. 
e AX =Oimplies AcX = 0: If X is a solution, so is cX. 
e AO = 0: The zero vector is a solution. 


The zero space W = {0} and the whole space W = R” are subspaces. A subspace is proper 
if it is not one of these two. The next proposition describes the proper subspaces of R?. 


Proposition 3.1.4 Let W be a proper subspace of the space R*, and let w be a nonzero 
vector in W. Then W consists of the scalar multiples cw of w. Distinct proper subspaces 
have only the zero vector in common. 


The subspace consisting of the scalar multiples cw of a given nonzero vector w is called the 
subspace spanned by w. Geometrically, it is a line through the origin in the plane R?. 


Proof of the proposition. We note first that a subspace W that is spanned by a nonzero 
vector w is also spanned by any other nonzero vector w’ that it contains. This is true 
because if w’ = cw with c 0, then any multiple aw can also be written in the form ac™!w’. 
Consequently, if two subspaces W; and W) that are spanned by vectors w; and w2 havea 
nonzero element v in common, then they are equal. 

Next, a subspace W of R2, not the zero space, contains a nonzero element w. Since 
W is a subspace, it contains the space W, spanned by wy, and if W; = W, then W consists 
of the scalar multiples of one nonzero vector. We show that if W is not equal to Wj, then it 
is the whole space R?. Let w2 be an element of W not in Wj, and let W2 be the subspace 
spanned by wy». Since W; + W2, these subspaces intersect only in 0. So neither of the two 
vectors w, and wz isa multiple of the other. Then the coordinate vectors, call them Aj, of w; 
aren’t proportional, and the 2X2 block matrix A = [A;|A2] with these vectors as columns has 
a nonzero determinant. In that case we can solve the equation AX = B for the coordinate 
vector B of an arbitrary vector v, obtaining the linear combination v = w 4x, + w2x2. This 
shows that W is the whole space R?. Oo 


It can also be seen geometrically from the parallelogram law for vector addition that 
every vector is a linear combination cyw, + c2W. 
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CjW) + CoW2 


C2W2 
cw, 


WwW) 


The description of subspaces of R? that we have given is clarified in Section 3.4 by the 
concept of dimension. 


3.2 FIELDS 


As mentioned at the beginning of Chapter 1, essentially all that was said about matrix 
operations is true for complex matrices as well as for real ones. Many other number systems 
serve equally well. To describe these number systems, we list the properties of the ‘“‘scalars” 
that are needed, and are led to the concept of a field. We introduce fields here before turning 
to vector spaces, the main topic of the chapter. 

Subfields of the field C of complex numbers are the simplest fields to describe. A 
subfield of C is a subset that is closed under the four operations of addition, subtraction, 
multiplication, and division, and which contains 1. In other words, F is a subfield of C if it 
has these properties: 


(3.2.1) G,-,%,+,1) 

¢ Ifaand bare in F, thena + bisin F. 

e Ifaisin F, then -a isin F. 

e Ifaand barein F, then ab is in F. 

e Ifaisin Fanda<0, thena! isin F. 

e Lisin F. 
These axioms imply that 1 — 1 = 0 is an element of F. Another way to state them is to say 
that F is a subgroup of the additive group Ct, and that the nonzero elements of F form a 
subgroup of the multiplicative group C™. 

Some examples of subfields of C: 
(a) the field R of real numbers, 
(b) the field Q of rational numbers (fractions of integers), 


(c) the field Q[V2] of all complex numbers of the form a + b./2, with rational numbers 
aand b, 


The concept of an abstract field is only slightly harder to grasp than that of a subtield, 
and it contains important new classes of fields, including finite fields. 
Definition 3.2.2 A field F is a set together with two laws of composition 


FXF45F and FXF3F 
called addition: a, b~+a+ b and multiplication: a, b ~» ab, which satisfy these axioms: 
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(i) Addition makes F into an abelian group F*; its identity element is denoted by 0. 


(ii) Multiplication is commutative, and it makes the set of nonzero elements of F into an 
abelian group F'; its identity element is denoted by 1. 


(iii) distributive law: For all a,b, and cin F, a(b+c) =ab+ac. 


The first two axioms describe properties of the two laws of composition, addition and 
multiplication, separately. The third axiom, the distributive law, relates the two laws. 

You will be familiar with the fact that the real numbers satisfy these axioms, but the fact 
that they are the only ones needed for the usual algebraic operations can only be understood 
after some experience. 

The next lemma explains how the zero element multiplies. 


Lemma 3.2.3 Let F be a field. 


(a) The elements 0 and 1 of F are distinct. 
(b) For all ain F, a0 = 0 and 0a = 0. 
(c) Multiplication in F is associative, and 1 is an identity element. 


Proof. (a) Axiom (ii) implies that 1 is not equal to 0. 


(b) Since 0 is the identity for addition, 0 + 0 = 0. Then a0 + a0 = a(0 + 0) = a0. Since Ft 
is a group, we can cancel a0 to obtain a0 = 0, and then 0a = 0 as well. 


(c) Since F — {0} is an abelian group, multiplication is associative when restricted to this 
subset. We need to show that a(bc) = (ab)c when at least one of the elements is zero. In 
that case, (b) shows that the products in question are equal to zero. Finally, the element 1 is 
an identity on F — {0}. Setting a = 1 in (b) shows that 1 is an identity on all of F. O 


Aside from subfields of the complex numbers, the simplest examples of fields are 
certain finite fields called prime fields, which we describe next. We saw in the previous 
chapter that the set Z/nZ of congruence classes modulo an integer n has laws of addition 
and multiplication derived from addition and multiplication of integers. All of the axioms 
for a field hold for the integers, except for the existence of multiplicative inverses. And as 
noted in Section 2.9, such axioms carry over to addition and multiplication of congruence 
classes. But the integers aren’t closed under division, so there is no reason to suppose that 
congruence classes have multiplicative inverses. In fact they needn’t. The class of 2, for 
example, has no multiplicative inverse modulo 6. It is somewhat surprising that when p is a 
prime integer, all nonzero congruence classes modulo p have inverses, and therefore the set 
Z/ pZis a field. This field is called a prime field, and is often denoted by F,. 

Using bar notation and choosing the usual representative elements for the p congruence 
classes, 


(3.2.4) F, = {0,1,..., p-1} = Z/pZ. 


Theorem 3.2.5 Let p be a prime integer. Every nonzero congruence class modulo p has a 
multiplicative inverse, and therefore F, is a field of order p. 


We discuss the theorem before giving the proof. 
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If a and b are integers, then @# 0 means that p does not divide a, and ab = 1 means 
ab=1 modulo p. The theorem can be stated in terms of congruence in this way: 


Let p be a prime, and let a be an integer not divisible by p. 


(3.2.6) There is an integer b such that ab=1 modulo p. 


Finding the inverse of a congruence class @ modulo p can be done by trial and error if p is 
small. A systematic way is to compute the powers of @. If p = 13 and a@ = 3, then a =9and 


a’ — 27 = 1. Weare lucky: @ has order 3, and therefore a¢ = a = 9. On the other hand, 
the powers of 6 run through every nonzero congruence class modulo 13. Computing powers 
may not be the fastest way to find the inverse of 6. But the theorem tells us that the set FS of 
nonzero congruence classes forms a group. So every element @ of FF has finite order, and if 


a has order r, its inverse will be av), 
To make a proof of the theorem using this reasoning, we need the cancellation law: 


Proposition 3.2.7 Cancellation Law. Let p be a prime integer, and let a, b and @ be 
elements of Fp. 


(a) Ifab =0,thena = 0orb=0. 
(b) Ifa #0 and if ab = GE, then b =¢. 


Proof. (a) We represent the congruence classes @ and b by integers a and b, and we translate 
into congruence. The assertion to be proved is that if p divides ab then p divides a or p 
divides b. This is Corollary 2.3.7. 


(b) It follows from (a) that if @#0 and a(b —¢) =0,then b—¢ =0. 0 


Proof of Theorem (3.2.5). Let @ be a nonzero element of F,. We consider the powers 
1, a, a’, a’, ... Since there are infinitely many exponents and only finitely many elements 
in F,, there must be two powers that are equal, say @” = @”, where m <n. We cancel a” 


from both sides: 1 = a@"~™ . Then @"~""-) is the inverse of a. Oo 


Tt will be convenient to drop the bars over the letters in what follows, trusting 
ourselves to remember whether we are working with integers or with congruence classes, 
and remembering the rule (2.9.8): 


If a and b are integers, thena = bin F, means a=b modulo p. 


As with congruences in general, computation in the field F, can be done by working 
with integers, except that division cannot be carried out in the integers. One can ope- 
rate with matrices A whose entries are in a field, and the discussion of Chapter 1 can be 
repeated with no essential change. 

Suppose we ask for solutions of a system of n linear equations in n unknowns in 
the prime field F,. We represent the system of equations by an integer system, choosing 
representatives for the congruence classes, say AX = B, where A is an n Xn integer matrix 
and B is an integer column vector. To solve the system in F,, we invert the matrix A 
modulo p. The formula cof(A)A = 6/, where 6 = det A (Theorem 1.6.9), is valid for integer 
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matrices, so it also holds in F, when the matrix entries are replaced by their congruence 
classes. If the congruence class of 6 isn’t zero, we can invert the matrix A in F, by computing 


5 | cof(A). 


Corollary 3.2.8 Let AX = B be a system of 1 linear equations in m unknowns, where the 
entries of A and B are in Fp, and let 5 = det A. If 4 is not zero, the system has a unique 
solution in F,. O 


Consider, for example, the system AX -= B, where 


aa[® 2]ou2=[ 3], 


The coefficients are integers, so AX = B defines a system of equations in F, for any prime 
p. The determinant of A is 42, so the system has a unique solution in F, for all p that do 
not divide 42, i.e., all p different from 2, 3, and 7. For instance, det A = 3 when evaluated 
modulo 13. Since 3°! = 9 in Fy3, 


mols liek | and x=ate=[7 


> 8 87 {| moduto 13 


The system has no solution in F2 or F3. It happens to have solutions in F7, though detA = 0 
modulo 7. 

Invertible matrices with entries in the prime field F, provide new examples of finite 
groups, the general linear groups over finite fields: 


GLp(Fp) = {n Xn invertible matrices with entries in F,} 
SLy (Fp) = {n Xn matrices with entries in F, and with determinant 1} 


For example, the group of invertible 2 x 2 matrices with entries in F2 contains the six 
elements 


vom ono) HL HEE 


This group is isomorphic to the symmetric group $3. The matrices have been listed in an 
order that agrees with our usual list {1, x, x, y, xy, xy} of the elements of $3. 


One property of the prime fields F,, that distinguishes them from subfields of C is that 
adding 1 to itself a certain number of times, in fact p times, gives zero. The characteristic of 
a field F is the order of 1, as an element of the additive group F'*, provided that the order 
is finite. It is the smallest positive integer m such that the sum 1+---+ 1 of m copies of 
1 evaluates to zero. If the order is infinite, that is, 1 + --- +1 is never 0 in F, the field is, 
somewhat perversely, said to have characteristic zero. Thus subfields of C have characteristic 
zero, while the prime field F, has characteristic p. 


Lemma 3.2.10 The characteristic of any field F is either zero or a prime number. 
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Proof. To avoid confusion, we let 0 and 1 denote the additive and the multiplicative identities 
in the field F, respectively, and if k is a positive integer, we let k denote the sum of k copies 
of 1. Suppose that the characteristic m is not zero. Then 1 generates a cyclic subgroup H of 
F* of order m, and m = 0. The distinct elements of the cyclic subgroup H generated by 1 
are the elements k with k = 0,1,..., m-1 (Proposition 2.4.2). Suppose that m isn’t prime, 
say m =rs, with 1 <r, s < m. Then# and § are in the multiplicative group F* = F — {0}, 
but the product *S, which is equal to 0, is not in F*. This contradicts the fact that F* is a 
group. Therefore m must be prime. Oo 


The prime fields F, have another remarkable property: 


Theorem 3.2.11 Structure of the Multiplicative Group. Let p be a prime integer. The 
multiplicative group FP of the prime field is a cyclic group of order p — 1. 


We defer the proof of this theorem to Chapter 15, where we prove that the multiplicative 
group of every finite field is cyclic (Theorem 15.7.3). 


e A generator for the cyclic group F Fs is called a primitive root modulo p. 


There are two primitive roots modulo 7, namely 3 and 5, and four primitive roots 
modulo 11. Dropping bars, the powers 3°, 3!, 3, .. . of the primitive root 3 modulo 7 list the 
nonzero elements of F 7 in the following order: 


(3.2.12) F? = {1, 3, 2, 6, 4, 5} = {1, 3, 2,-1, -3, -2}. 


Thus there are two ways to list the nonzero elements of F,, additively and multiplica- 
tively. If w is a primitive root modulo p, 


(3.2.13) By 1259s cep) = (Ot ag), 


3.3. VECTOR SPACES 


Having some examples and the concept of a field, we proceed to the definition of a vector 
space. 


Definition 3.3.1 A vector space V over a field F is a set together with two laws of 
composition: 
(a) addition. VX V > V, written v, w~>u+ w, for v and w in V, 
(b) scalar multiplication by elements of the field: Fx V > V, written c, v~»cv, forc in 
F and vin V. 
These laws are required to satisfy the following axioms: 
e Addition makes V into a commutative group V*, with identity denoted by 0. 
e lv=v, for all vin V. 
© associative law: (ab)v = a(bv), for all a and bin F and all vin V. 


e distributive laws: (a+ b)v =av+ bu and a(v+ w) = av+aw, for all a and bin 
F andall vand w in V. 
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The space F” of column vectors with entries in the field F forms a vector space over F, 
when addition and scalar multiplication are defined as usual (3.1.1). 
Some more examples of real vector spaces (vector spaces over R): 


Examples 3.3.2 


(a) Let V =C be the set of complex numbers. Forget about multiplication of two complex 
numbers. Remember only addition a + 8 and multiplication ra of a complex number @ 
by a real number r. These operations make V into a real vector space. 

(b) The set of real polynomials p(x) = aynx" +--++ ap is a real vector space, with 
addition of polynomials and multiplication of polynomials by real numbers as its laws of 
composition. 

(c) The set of continuous real-valued functions on the real line is a real vector space, with 
addition of functions f + g and multiplication of functions by real numbers as its laws 
of composition. 


(d) The set of solutions of the differential equation ay = -y is areal vector space. O 


Each of our examples has more structure than we look at when we view it as a vector space. 
This is typical. Any particular example is sure to have extra features that distinguish it from 
others, but this isn’t a drawback. On the contrary, the strength of the abstract approach lies 
in the fact that consequences of the axioms can be applied in many different situations. 


Two important concepts, subspace and isomorphism, are analogous to subgroups and 
isomorphisms of groups. As with subspaces of R”, a subspace W of a vector space V 
over a field F is a nonempty subset closed under the operations of addition and scalar 
multiplication. A subspace W is proper if it is neither the whole space V nor the zero 
subspace {0}. For example, the space of solutions of the differential equation (3.3.2)(d) is a 
proper subspace of the space of all continuous functions on the real line. 


Proposition 3.3.3 Let V = F? be the vector space of column vectors with entries in a field 
F.. Every proper subspace W of V consists of the scalar multiples {cw} of a single nonzero 
vector w. Distinct proper subspaces have only the zero vector in common. 


The proof of Proposition 3.1.4 carries over. O 


Example 3.3.4 Let F be the prime field F,. The space F 2 contains p? vectors, p* — 1 
of which are nonzero. Because there are p — 1 nonzero scalars, the subspace W = {cw} 
spanned by a nonzero vector w will contain p ~ 1 nonzero vectors. Therefore F? contains 
(p* -1)/(p—-) =pt+1 proper subspaces. O 


An isomorphism from a vector space V to a vector space V’, both over the same field 
F, is a bijective map g: V > V’ compatible with the two laws of composition, a bijective 


map such that 
(3.3.5) g(v+w)=(v)+¢(w) and g(cv) =c¢(v), 


for all v and w in V and all c in F. 
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Examples 3.3.6 


(a) Let F”*” denote the set of m Xn matrices with entries in a field F. This set is a vector 
space over F, and it is isomorphic to the space of column vectors of length n2. 

(b) If we view the set of complex numbers as a real vector space, as in (3.3.2)(a), the map 
y:R?2 — C sending (a, b)' a + bi is an isomorphism. O 


3.4 BASES AND DIMENSION 


We discuss the terminology used when working with the operations of addition and scalar 
multiplication in a vector space. The new concepts are span, independence, and basis. 

We work with ordered sets of vectors here. We put curly brackets around unordered 
sets, and we enclose ordered sets with round brackets in order to make the distinction clear. 
Thus the ordered set (v, w) is different from the ordered set (w, v), whereas the unordered 
sets {v, w} and {w, v} are equal. Repetitions are allowed in an ordered set. So (v, v, w) is 
an ordered set, and it is different from (v, w), in contrast to the convention for unordered 
sets, where {v, v, w} and {v, w} denote the same sets. 


e Let V be a vector space over a field F, and let S = (vj,..., Un) be an ordered set of 
elements of V. A linear combination of S is a vector of the form 


(3.4.1) W= CyUy+:-++CnUn, with c; in F. 


It is convenient to allow scalars to appear on either side of a vector. We simply agree 
that if v is a vector and c isa scalar, then the notations vc and cv stand for the same vector, 
the one obtained by scalar multiplication. So vycy +--+ + UnCn = C1U, + +++ + Cn Un. 

Matrix notation provides a compact way to write a linear combination, and the way we 
write ordered sets of vectors is chosen with this in mind. Since its entries are vectors, we call 
an array S = (vy,..., Un) a hypervector. Multiplication of two elements of a vector space 
is not defined, but we do have scalar multiplication. This allows us to interpret a product of 
the hypervector § and a column vector X in F'”, as the matrix product 


x1 
(3.4.2) SX =(U1,..-,Un)] 2 | = yx, +---+ UnXn. 

Xn 
Evaluating the right side by scalar multiplication and vector addition, we obtain another 
vector, a linear combination in which the scalar coefficients x; are on the right. 


We carry along the subspace W of R? of solutions of the linear equation 
(3.4.3) 2x1 —xX2 —2x3=0, or AX =0, where A = (2, -1, -2) 


as an example. Two particular solutions w; and wz are shown below, together with a linear 
combination wy y1 + w2y2. 


I 1 yit y2 
(3.4.4) wy=]0)}, w=]2], wiyrtwray2z=| 2y2 
1 0 yt 
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If we write S = (w,, w2) with w; as in (3.4.4) and Y = (yj, y2)’, then the combination 
Wy, + wW2y2 can be written in matrix formas SY. 


« The set of all vectors that are linear combinations of § = (v1, ..., Un) forms a subspace 
of V, called the subspace spanned by the set. 


As in Section 3.1, this span is the smallest subspace of V that contains S, and it will 
often be denoted by Span S. The span of a single vector (v1) is the space of scalar multiples 
Cv of V4. 

One can define span also for an infinite set of vectors. We discuss this in Section 3.7. 
Let’s assume for now that the sets are finite. 


Lemma 3.4.5 Let S be an ordered set of vectors of V, and let W be a subspace of V. If 
SC W,then Spans Cc W. O 


The column space of an m Xn matrix with entries in F is the subspace of F’” spanned 
by the columns of the matrix. It has an important interpretation: 


Proposition 3.4.6 Let A be an m Xn matrix, and let B be acolumn vector, both with entr es 
in a field F. The system of equations AX = B has a solution for X in F™ if and only if B is 
in the column space of A. 


Proof. Let A,,..., An denote the columns of A. For any column vector X = (x1,..., Xn)', 
the matrix product AX is the column vector A,x; + ---+AnXn. This is a linear combination 
of the columns, an element of the column space, and if AX = B, then B is this linear 
combination. O 


A linear relation among vectors v1, ..., Un is any linear combination that evaluates to 
zero — any equation of the form 


(3.4.7) VX, + V2X2 +--+ + Unxn = 0 


that holds in V, where the coefficients x; are in F’. A linear relation can be useful because, if 
Xn is not zero, the equation (3.4.7) can be solved for up. 


Definition 3.4.8 An ordered set of vectors S = (v1,..., Un) is independent, or linearly 
independent if there is no linear relation SX = 0 except for the trivial one in which X = 0, 
i.e., in which all the coefficients x; are zero. A set that is not independent is dependent. 


An independent set S cannot have any repetitions. If two vectors v; and v; of S are 
equal, then v; — v; = 0 is a linear relation of the form (3.4.7), the other coefficients being 
zero. Also, no vector v; in an independent set is zero, because if v; is zero, then v; = 0 isa 
linear relation. 


Lemma 3.4.9 


(a) A set (v1) of one vector is independent if and only if v; #0. 
(b) A set (v1, v2) of two vectors is independent if neither vector is a multiple of the other. 
(c) Any reordering of an independent set is independent. O 
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Suppose that V is the space F'” and that we know the coordinate vectors of the vectors 
in the set S = (v1, ..., U,). Then the equation SX = 0 gives us a system of m homogeneous 
linear equations in the n unknowns x;, and we can decide independence by solving this 
system. 


Example 3.4.10 Let S = (v1, v2, v3, v4) be the set of vectors in R? whose coordinate vectors 


are 
1 1 2 1 
(3.4.11) A,=|0}, Ar=|2], 43=]1], A=] 
1 0 2 3 


Let A denote the matrix made up of these column vectors: 
11241 

(3.4.12) A=!0 211 
102 3 


A linear combination will have the form SX = v1x1 + 2X2 + v3x3 + v4x4, and its coordinate 
vector will be AX = A,X} + A2x2 + A3.x3 + Aqx4. The homogeneous equation AX = 0 has a 
nontrivial solution because it is a system of three homogeneous equations in four unknowns. 
So the set S is dependent. On the other hand, the determinant of the 3 x3 matrix A’ formed 
from the first three columns of (3.4.12) is equal to 1, so the equation A’X = 0 has only the 
trivial solution. Therefore (v1, v2, v3) is an independent set. O 


Definition 3.4.13 A basis of a vector space V is a set (vj,..., Un) of vectors that is 
independent and also spans V. 


We will often use a boldface symbol such as B to denote a basis. The set (v4, v2, V3) 
defined above is a basis of R* because the equation A’X = B has a unique solution for all 
B (see 1.2.21). The set (w , w2) defined in (3.4.4) is a basis of the space of solutions of the 
equation 2x, — x2 — 2x3 = 0, though we haven’t verified this. 


Proposition 3.4.14 The set B = (v1, ..., Un) is a basis of V if and only if every vector w in 
V can be written in a unique way as a combination w = v1, xX; +--+ + UnxXn = BX. 


Proof. The definition of independence can be restated by saying that the zero vector can be 
written as a linear combination in just one way. If every vector can be written uniquely as a 
combination, then B is independent, and spans V, so it is a basis. Conversely, suppose that B 
is a basis. Then every vector w in V can be written as a linear combination. Suppose that w 
is written as a combination in two ways, say w = BX = BX’. Let Y = X — X’. Then BY = 0. 
This is a linear relation among the vectors v;,..., Un, which are independent. Therefore 
X ~— X' = 0. The two combinations are the same. O 


Let V = F" be the space of column vectors. As before, e; denotes the column vector 
with 1 in the ith position and zeros elsewhere (see (1.1.24)). The set E = (€1,..., @n) is 
a basis for F'” called the standard basis. If the coordinate vector of a vector v in F'” is 
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X =(x1,...,%,)', then v = EX = €1x1 +-+-+€,Xp is the unique expression for v in terms 
of the standard basis. 


We now discuss the main facts that relate the three concepts, of span, independence, 
and basis. The most important one is Theorem 3.4.18. 


Proposition 3.4.15 Let S = (v1, ..., Un) be an ordered set of vectors, let w be any vector in 
V, and let S’ = (S, w) be the set obtained by adding w to S. 


(a) Span S = Span S’ if and only if w is in Span S. 
(b) Suppose that S is independent. Then S” is independent if and only if w is not in Span S. 


Proof. This is very elementary, so we omit most of the proof. We show only that if S is 
independent but S’ is not, then w is in the span of S. If S’ is dependent, there is some linear 
relation 

Vi{X1 + +++ + UnXyn + wy = 0, 


in which the coefficients x1,...,X» and y are not all zero. If the coefficient y were zero, 
the expression would reduce to SX = 0, and since S is assumed to be independent, we could 
conclude that X = 0 too. The relation would be trivial, contrary to our hypothesis. So y40, 
and then we can solve for w as a linear combination of vj, ..., Un. 0 


e A vector space V is finite-dimensional if some finite set of vectors spans V. Otherwise, V 
is infinite-dimensional. 


For the rest of this section, our vector spaces are finite-dimensional. 


Proposition 3.4.16 Let V be a finite-dimensional vector space. 


(a) Let S be a finite subset that spans V, and let L be an independent subset of V. One can 
obtain a basis of V by adding elements of S to L. 

(b) Let S be a finite subset that spans V. One can obtain a basis of V by deleting elements 
from S. 


Proof. (a) If S is contained in Span L, then L spans V, and so it is a basis (3.4.5). If not, 
we choose an element v in S, which is not in Span L. By Proposition 3.4.15, L’ = (L, v) 
is independent. We replace L by L’. Since S is finite, we can do this only finitely often. So 
eventually we will have a basis. 


(b) If S is dependent, there is a linear relation v}c) +---+Uncn = 0in which some coefficient, 
SAV Cn, is not Zero. We can solve this equation for v,, and this shows that vy, is in the span of 
the set S; of the first n — 1 vectors. Proposition 3.4.15(a) shows that Span S = Span $1. So S; 
spans V. We replace S by S,. Continuing this way we must eventually obtain a family that is 
independent but still spans V: a basis. 


Note: There is a problem with this reasoning when V is the zero vector space {0}. Starting 
with an arbitrary set S of vectors in V, all equal to zero, our procedure will throw them 
out one at a time until there is only one vector v; left. And since v; is zero, the set (v}) is 
dependent. How can we proceed? The zero space isn’t particularly interesting, but it may 
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lurk in a corner, ready to trip us up. We have to allow for the possibility that a vector space 
that arises in the course of some computation, such as solving a system of homogeneous 
linear equations, is the zero space, though we aren’t aware of this. In order to avoid having 
to mention this possibility as a special case, we adopt the following definitions: 


(3.4.17) 
¢ The empty set is independent. 
e The span of the empty set is the zero space {0}. 


Then the empty set is a basis for the zero vector space. These definitions allow us to throw 
out the last vector vy, which rescues the proof. O 


We come now to the main fact about independence: 


Theorem 3.4.18 Let S and L be finite subsets of a vector space V. Assume that S spans V 
and that L is independent. Then S contains at least as many elements as L does: |S| > |L]. 


As before, |5| denotes the order, the number of elements, of the set S. 


Proof. Say that S = (v1,..., Um) and that L = (wy ,..., wy). We assume that |S| < |L], 
i.e., that m <n, and we show that L is dependent. To do this, we show that there is a linear 
relation wx, +--+ + Wyxp, = 0, in which the coefficients x; aren’t all zero. We write this 
undetermined relation as LX = 0. 

Because S§ spans V, each element wj of L is a linear combination of S, say w; = 
vja1j + +++ + Um@mj; = SAj;, where Aj; is the column vector of coefficients. We assemble 
these column vectors into an m Xn matrix 


| | 


(3.4.19) AS Ag: ee! Ag 

Then 

(3.4.20) SA = (SAq1,...,SAn) = (W},...,Wn) = L. 

We substitute SA for L into our undetermined linear combination: 
LX = (SA)X. 


The associative law for scalar multiplication implies that (SA)X = S(AX). The proof is the 
same as for the associative law for multiplication of scalar matrices (which we omitted). If 
AX = 0, then our combination LX will be zero too. Now since A is an m Xn matrix with 
m <n, the homogeneous system AX = 0 has a nontrivial solution X. Then LX = 0 is the 
linear relation we are looking for. O 


Proposition 3.4.21 Let V be a finite-dimensional vector space. 


(a) Any two bases of V have the same order (the same number of elements). 
(b) Let B be a basis. If a finite set S of vectors spans V, then |S| > |B|, and |S| = |B| if and 
only if S is a basis. 
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(c) Let B be a basis. If aset L of vectors is independent, then |L] < |B|, and |L| = |B| if and 
only if L is a basis. 


Proof. (a) We note here that two finite bases B; and Bz have the same order, and we will 
show in Corollary 3.7.7 that every basis of a finite-dimensional vector space is finite. Taking 
S = B, and L = B» in Theorem 3.4.18 shows that |B,| > |Bo|, and similarly, |B>| > |B,|. 
Parts (b) and (c) follow from (a) and Proposition 3.4.16. O 


Definition 3.4.22 The dimension of a finite-dimensional vector space V is the number of 
vectors in a basis. This dimension will be denoted by dim V. 


The dimension of the space F” of column vectors is n because the standard basis E = 
(€;,..., €n) contains n elements. 


Proposition 3.4.23 If W is a subspace of a finite-dimensional vector space V, then W is 
finite-dimensional, and dim W < dim V. Moreover, dim W = dim V if and only if W = V. 


Proof. We start with any independent set L of vectors in W, possibly the empty set. If L 
doesn’t span W, we choose a vector w in W not in the span of L. Then L’ = (L, w) will be 
independent (3.4.15). We replace L by L’. 

Now it is obvious that if L is an independent subset of W, then it is also independent 
when thought of as a subset of V. So Theorem 3.4.18 tells us that |L| < dim V. Therefore 
the process of adding elements to L must come to an end, and when it does, we will have a 
basis of W. Since L contains at most dim V elements, dim W < dim V. If |L| = dim V, then 
Proposition 3.4.21(¢) shows that L is a basis of V, and therefore W = V. O 


3.5 COMPUTING WITH BASES 


The purpose of bases is to provide a method of computation, and we learn to use them in 
this section. We consider two topics: how to express a vector in terms of a basis, and how to 
relate different bases of the same vector space. 

Suppose we are given a basis B = (v1, ..., Un) of avectorspace V over F. Remember: 
This means that every vector v in V can be expressed as a combination 


(3.5.1) V=VUyxX_ +++ + nxn, with x; in F, 

in exactly one way (3.4.14). The scalars x; are the coordinates of v, and the column vector 
xy 

(3.5.2) X=| : 
Xn 

is the coordinate vector of v, with respect to the basis B. 

For example, (cos f, sint) is a basis of the space of solutions of the differential equation 
y" =~y. Every solution of this differential equation is a linear combination of this basis. If 


we are given another solution f(t), the coordinate vector (x, x2)' of f is the vector such 
that f(t) = (cos f)x, + (sin f)x2. Obviously, we need to know something about f to find X. 
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Not very much: just enough to determine two coefficients. Most properties of f are implicit 
in the fact that it solves the differential equation. 

What we can always do, given a basis B of a vector space of dimension n, is to define 
an isomorphism of vector spaces (see 3.3.5) from the space F'” to V: 


(3.5.3) w:F" + V thatsends X~ BX. 
We will often denote this isomorphism by B, because it sends a vector X to BX. 


Proposition 3.5.4 Let S = (v1, ..., Un) be a subset of a vector space V, and let yw: F” > V 
be the map defined by w(X) = SX. Then 


(a) wis injective if and only if S is independent, 
(b) vis surjective if and only if S spans V, and 
(c) vis bijective if and only if S is a basis of V. 


This follows from the definitions of independence, span, and basis. D 


Given a basis, the coordinate vector of a vector v in V is obtained by inverting the map 
yr (3.5.3). We won’t have a formula for the inverse function unless the basis is given more 
explicitly, but the existence of the isomorphism is interesting: 


Corollary 3.5.5 Every vector space V of dimension n over a field F is isomorphic to the 
space /” of column vectors. 0 


Notice also that F” is not isomorphic to F”™ when m¥n, because F'” has a basis of n 
elements, and the number of elements in a basis depends only on the vector space. Thus the 
finite-dimensional vector spaces over a field F' are completely classified. The spaces Ff” of 
column vectors are representative elements for the isomorphism classes. 


The fact that a vector space of dimension 7 is isomorphic to F” will allow us to 
translate problems on vector spaces to the familiar algebra of column vectors, once a basis 
is chosen. Unfortunately, the same vector space V will have many bases. Identifying V with 
the isomorphic space F’” is useful when a natural basis is in hand, but not when a basis is 
poorly suited to a given problem. In that case, we will need to change coordinates, i.e., to 
change the basis. 

The space of solutions of a homogeneous linear equation AX = 0, for instance, almost 
never has a natural basis. The space W of solutions of the equation 2x, — x2 — 2x3 = 0 
has dimension 2, and we exhibited a basis before: B = (w1, w2), where w; = (1, 0,1)! and 
w2 = (1, 2,0)' (see (3.4.4)). Using this basis, we obtain an isomorphism of vector spaces 
R? — W that we may denote by B. Since the unknowns in the equation are labeled x;, we 
need to choose another symbol for variable elements of R* here. We’ll use Y = (y, y2)'. 
The isomorphism B sends Y to the coordinate vector of BY = w y, + w2y2 that was 
displayed in (3.4.4). 

However, there is nothing very special about the two particular solutions w; and w. 
Most other pairs of solutions would serve just as well. The solutions w, = (0, 2,-1)' and 
ws = (1, 4,-1)' give us a second basis B’ = (w{,, w}) of W. Either basis suffices to express 
the solutions uniquely. A solution can be written in either one of the forms 
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yit ya ; ¥2 
/ 
(3.5.6) 292 or 2M + 4y) 
¥1 “Yi — Ya 


Change of Basis 


Suppose that we are given two bases of the same vector space V, say B = (v1, ..., Un) and 
B’ = (vj, ..., U,,). We wish to make two computations. We ask first: How are the two bases 
related? Second, a vector v in V will have coordinates with respect to each of these bases, 
but they will be different. So we ask: How are the two coordinate vectors related? These are 
the basechange computations, and they will be very important in later chapters. They can 
also drive you nuts if you don’t organize the notation carefully. 

Let’s think of B as the old basis and B’ as a new basis. We note that every vector of the 
new basis B’ is a linear combination of the old basis B. We write this combination as 


(3.5.7) ve= V1 Pi j + V2 prj t+ -++ + Un Pnj- 
The column vector Pj = (Pij,---; Pnj)* is the coordinate vector of the new basis vector 


Vin when it is computed using the old basis. We collect these column vectors into a square 
matrix P, obtaining the matrix equation B’ = BP: 


(3.5.8) Bois, Sin) P = BP. 


The jth column of P is the coordinate vector of the new basis vector v; with respect to the 


old basis. This matrix P is the basechange matrix. | 


Proposition 3.5.9 


(a) Let B and B’ be two bases of a vector space V. The basechange matrix P is an invertible 
matrix that is determined uniquely by the two bases B and B’. 

(b) Let B= (v4, ..., U,) be a basis of a vector space V. The other bases are the sets of the 
form B’ = BP, where P can be any invertible n Xn matrix. 


Proof. (a) The equation B’ = BP expresses the basis vectors v; as linear combinations of 
the basis B. There is just one way to do this (3.4.14), so P is unique. To show that P is 
an invertible matrix, we interchange the roles of B and B’. There is a matrix Q such that 
B = B’@. Then 


B = BOQ = BPQ, or (v,..-,Un) =(%,.--, Un) PQ 


This equation expresses each v; as a combination of the vectors (v1, ..., U,). The entries 
of the product matrix PQ are the coefficients. But since B is a basis, there is just one way to 


1This basechange matrix is the inverse of the one that was used in the first edition. 
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write v; as acombination of (v1,..., Un), namely vj = vj, or in matrix notation, B = BI. So 
PQ=I. 


(b) We must show that if B is a basis and if P is an invertible matrix, then B’ = BP is also a 
basis. Since P is invertible, B = B’P~!. This tells us that the vectors v; are in the span of B’. 
Therefore B’ spans V, and since it has the same number of elements as B, it is a basis. O 


Let X and X’ be the coordinate vectors of the same arbitrary vector v, computed with 
respect to the two bases B and B’, respectively, that is, v = BX and v = B’X’. Substituting 
B = B’P"! gives us the matrix equation 


(3.5.10) v=BX =B'P xX. 


This shows that the coordinate vector of v with respect to the new basis B’, which we call X’, 
is P-1X. We can also write this as X = PX’. 

Recapitulating, we have a single matrix P, the basechange matrix, with the dual 
properties 


(3.5.11) B’=BP and PX’=X, 


where X and X’ denote the coordinate vectors of the same arbitrary vector v, with respect 
to the two bases. Each of these properties characterizes P. Please take note of the positions 
of P in the two relations. 


Going back once more to the equation 2x, — x2 — 2x3 = 0, let B and B’ be the bases 


of the space W of solutions described above, in (3.5.6). The basechange matrix solves the 
equation 


Oe: feos fe ide 
Ot le oe 2 E al It is p=| ie 
1-1 1 Qo | L Pat P22 1 2 


The coordinate vectors Y and Y’ of a given vector v with respect to these two bases, the ones 
that appear in (3.5.6), are related by the equation 


Me) al ee 

1 23% y2 

Another example: Let B be the basis (cost, sin ¢) of the space of solutions of the differential 
equation Se 
e*? — cost+isint are also solutions, and B’ = (e’*, e”!*) is a new basis of the space of 
solutions. The basechange computation is 


= ~y. If we allow complex valued functions, then the exponential functions 


(3.5.12) (et, e-*) = (cost, sin?) E 2 


One case in which the basechange matrix is easy to determine is that V is the space 
F” of column vectors, the old basis is the standard basis E = (e€1,...,é@,), and the new 
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basis, we’ll denote it by B = (11, ..., Un) here, is arbitrary. Let the coordinate vector of v;, 
with respect to the standard basis, be the column vector B;. So vu; = EB;. We assemble these 
column vectors into an n Xn matrix that we denote by [B]: 

(3.5.13) 


[B] = By «+» By |.Then (v,...,Un) = (€1,...,€n) By -:- Bal, 


| | | 


i.e., B = E[B]. Therefore [B] is the basechange matrix from the standard basis E to B. 


3.6 DIRECT SUMS 


The concepts of independence and span of a set of vectors have analogues for subspaces. 


If W,,..., Wx are subspaces of a vector space V, the set of vectors v that can be written 
as asum 
(3.6.1) V=UW1+---+Wk, 


where w,; is in W; is called the sum of the subspaces or their span, and is denoted by 
Wi +---+ We: 


(3.6.2) Wit---+Wr=(ve Vl v=uy,+---+ wz, with w; in Wj}. 


The sum of the subspaces is the smallest subspace that contains all of the subspaces 
W,,..., Wx. It is analogous to the span of a set of vectors. 

The subspaces Wj, ..., Wx are called independent if no sum w; + +--+ wy with w; in 
W; is zero, except for the trivial sum, in which w; = 0 for alli. In other words, the spaces are 
independent if 


(3.6.3) wi +-:-+w, =0, with w;in Wj, implies w; = 0 for all i. 
Note: Suppose that v1,..., vg are elements of V, and let W; be the span of the vector 
vj. Then the subspaces Wj, ..., W; are independent if and only if the set (v1, ..., vn) is 


independent. This becomes clear if we compare (3.4.8) and (3.6.3). The statement in terms 
of subspaces is actually the neater one, because scalar coefficients don’t need to be put in 
front of the vectors wj; in (3.6.3). Since each of the subspaces Wj; is closed under scalar 
multiplication, a scalar multiple cw; is simply another element of W;. D0 


We omit the proof of the next proposition. 


Proposition 3.6.4 Let Wi, ..., W; be subspaces of a finite-dimensional vector space V, and 
let B; be a basis of Wj. 


(a) The following conditions are equivalent: 
e The subspaces W; are independent, and the sum W + --- + W, is equal to V. 
e The set B = (B,,..., Bx) obtained by appending the bases B; is a basis of V. 

(b) dim(W,; + ---+ W;) < dim W, + ---+ dim W,, with equality if and only if the spaces 
are independent. 
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(c) If Ww; is a subspace of W; fori = 1, ..., k, andif the spaces W;, ..., W, are independent, 
then so are the W;,..., W,. fe} 


If the conditions of Proposition 3.6.4(a) are satisfied, we say that V is the direct sum of 
W,,..., Wx, and we write V = W; ®---® Wx: 


V=W,98.:-:-OW, ifWi+---+W=V 


(3.6.5) and W,,..., W are independent. 


If V is the direct sum, every vector v in V can be written in the form (3.6.1) in exactly one 
way. 


Proposition 3.6.6 Let W, and W) be subspaces of a finite-dimensional vector space V. 


(a) dim W,; +dim W2 = dim(W,N W2) + dim(W, + W)). 

(b) W, and W) are independent if and only if W; MN W2 = {0}. 

(c) Visthe direct sum W, ® W?2 if and only if Wi,QNW> = {0} and W, + W2 = V. 
(d) If W; + W2 = V, there is a subspace W, of W2 suchthat W; ® W, = V. 


Proof. We prove the key part (a): We choose a basis, U = (41, ..., ux) for Wy W2, and we 
extend it to a basis (U, V) = (u41,..., Uz; V1,---, Um) Of W). We also extend U to a basis 
(U, W) = (44,..., Ux; W1,.--, Wn) of Wo. Then dim(W;N W2) =k, dimW,; =k+m, 
and dim W2 = k +n. The assertion will follow if we prove that the setofk+ m+n elements 
(U, V, W) = (41, ...., U5 U1, ---5 Umi W1,-.., Wn) is a basis of Wy + Wo. 

We must show that (U, V, W) is independent and spans W; + W2. An element v of 
W, + W2 has the form w’ + w” where w’ is in W; and w” is in W2. We write w” in terms of 
our basis (U, V) for W;, say w’ = UX + VY =u4x1, + ++) + Ugxtvpy1 +++: + Um Ym. We 
also write w” as a combination UX’ + WZ of our basis (U, W) for W2. Then V = w’+ w” = 
UX +X) + VY 4+ WZ. 

Next, suppose we are given a linear relation UX + VY + WZ = 0, among the elements 
(U, V, W). We write this as UX + VY = -WZ. The left side of this equation is in W; and the 
right side is in W2. Therefore -WZ is in W, M W2, and so it is a linear combination UX’ of the 
basis U. This gives us an equation UX’ + WZ = 0. Since the set (U, W) is a basis for W2, it is 
independent, and therefore X’ and Z are zero. The given relation reduces to UX + VY = 0. 
But (U, V) is also an independent set. So X and Y are zero. The relation was trivial. O 


3.7 INFINITE-DIMENSIONAL SPACES 


Vector spaces that are too big to be spanned by any finite set of vectors are called infinite- 
dimensional. We won’t need them very often, but they are important in analysis, so we 
discuss them briefly here. 

One of the simplest examples of an infinite-dimensional space is the space R® of 
infinite real row vectors 


(3.7.1) (a) = (a4, @2, a3,...). 


An infinite vector can be thought of as a sequence aj, a2, ... of real numbers. 
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The space R™ has many infinite-dimensional subspaces. Here are a few; you will be 
able to make up some more: 
Examples 3.7.2 


(a) Convergent sequences: C = {(a) ¢ R® | the limit lim a, exists }. 
noo 


CO 
(b) Absolutely convergent series: £! = {(a) € R®| ¥° jan| < ox}. 
i 


(c) Sequences with finitely many terms different from zero. 
Z = {(a) € R*| an = 0 for all but finitely many n}. 


Now suppose that V is a vector space, infinite-dimensional or not. What do we mean 
by the span of an infinite set S of vectors? It isn’t always possible to assign a value to 
an infinite combination c,v, + c2U2 +---. If V is the vector space R”, then a value can 
be assigned provided that the series cjv, + c2v2 + --- converges. But many series don’t 
converge, and then we don’t know what value to assign. In algebra it is customary to speak 
only of combinations of finitely many vectors. The span of an infinite set 5S is defined to be 
the set of the vectors v that are combinations of finitely many elements of S: 


(3.7.3) V=C\U,+---+cyv;, Where vj,...,U, arein S. 


The vectors v; in S can be arbitrary, and the number r is allowed to depend on the vector v 
and to be arbitrarily large: 


finite combinations 
(3.7.4) Se of elements of S | 
For example, let e; = (0,..., 0,1, 0,...) be the row vector in R® with 1 in the ith 


position as its only nonzero coordinate. Let E = (e1, e2, €3, ...) be the set of these vectors. 
This set does not span R™, because the vector 


w= (1,1,1,...) 


is not a (finite) combination. The span of the set E is the subspace Z (3.7.2)(c). 
A set S, finite or infinite, is independent if there is no finite linear relation 


(3.7.5) Cyvy +-+-+c-v, =0, with v1,..., vu, in S, 


except for the trivial relation in which c; = --- = c; = 0. Again, the number r is allowed to 
be arbitrary, that is, the condition has to hold for arbitrarily large r and arbitrary elements 
v},..., v, of S. For example, the set S’ = (w; e;, €2, €3,...) is independent, if w and e; are 
the vectors defined above. With this definition of independence, Proposition 3.4.15 continues 
to be true. 

As with finite sets, a basis S of V is an independent set that spans V. The set 
S = (€;, €2,...) is a basis of the space Z. The monomials x! form a basis for the space 
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of polynomials. It can be shown, using Zorn’s Lemma or the Axiom of Choice, that every 
vector space V has a basis (see the appendix, Proposition A.3.3). However, a basis for R® 
will have uncountably many elements, and cannot be made very explicit. 

Let us go back for a moment to the case that our vector space V is finite-dimensional 
(3.4.16), and ask if there can be an infinite basis. We saw in (3.4.21) that any two finite bases 
have the same number of elements. We complete the picture now, by showing that every 
basis is finite. This follows from the next lemma. 


Lemma 3.7.6 Let V be a finite-dimensional vector space, and let § be any set that spans V. 
Then S contains a finite subset that spans V. 


Proof. By hypothesis, there is a finite set, say (41, ..., Um), that spans V. Because S spans 
V, each of the vectors u; is a linear combination of finitely many elements of S. The elements 
of S that we use to write all of these vectors as linear combinations make up a finite subset 
S’ of S. Then the vectors u; are in Span S’, and since (uj, ..., 4m) Spans V,sodoes 8S’. O 


Corollary 3.7.7. Let V be a finite-dimensional vector space. 


e Every basis is finite. 
e Every set S that spans V contains a basis. 
« Every independent set L is finite, and can be extended to a basis. O 


| don’t need to learn 8 + 7: Ill remember 8 + 8 and subtract 1. 
—T. Cuyler Young, Jr. 


EXERCISES 


Section 1 Fields 


1.1. Prove that the numbers of the form a + bV2, where a and b are rational numbers, form a 
subfield of C. 


1.2. Find the inverse of 5 modulo p, for p = 7, 11, 13, and 17. 


1.3. Compute the product polynomial (x? + 3x? + 3x +1)(x* + 4x3 + 6x? + 4x +1) when the 
coefficients are regarded as elements of the field F 7. Explain your answer. 


1.4. Consider the system of linear equations BS | el |e 
2 6]| x2 1 


(a) Solve the system in F, when p = 5, 11, and 17. 
(b) Determine the number of solutions when p = 7. 


1.5. Determine the primes p such that the matrix 


12 O 
A=]| 0 3 -1 
2 0 2 


is invertible, when its entries are considered to be in Fy. 
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1.6. Solve completely the systems of linear equations AX = 0 and AX = B, where 


(ain@, (b)inF., (c)inF3, (d)inF7. 

1.7. By finding primitive elements, verify that the multiplicative group fe iscyclic for all primes 
p < 20. 

1.8. Let p be a prime integer. 


(a) Prove Fermat’s Theorem: For every integera, a? =a modulo p. 
(b) Prove Wilson’s Theorem: (p — 1)!=-1(modulo p). 


19. Determine the orders of the matrices i ; and ir 1 in the group GL2(F7). 


1.10. Interpreting matrix entries in the field F2, prove that the four matrices 


0 0 1 0 i 1 0 1 
E nels hl mela {| toma fit 
Hint: You can cut the work down by using the fact that various laws are known to hold for 


addition and multiplication of matrices. 


1.11. Prove that the sef of symbols {a + bi | a, b € F3} forms a field with nine elements, if the 
laws of composition are made to mimic addition and multiplication of complex numbers. 
Will the same method work for Fs? For F7? Explain. 


Section2 Vector Spaces 


2.1. (a) Prove that the scalar product of a vector with the zero element of the field F is the 
zero vector. 


(b) Prove that if w is an element of a subspace W, then -w is in W too. 


2.2. Which of the following subsets is a subspace of the vector space F”” of n Xn matrices 
with coefficients in F’? 
(a) symmetric matrices (A = A‘), (b) invertible matrices, (c) upper triangular matrices. 


Section3 Bases and Dimension 
3.1. Find a basis for the space of n Xn symmetric matrices (A’ = A). 
3.2. Let W C R* be the space of solutions of the system of linear equations AX = 0, where 


_{2 1 2 3 : . 
A=|i 13 a |-Find a basis for W. 


3.3. Prove that the three functions x2 


, cos X, and e* are linearly independent. 


3.4. Let A be an m Xn matrix, and let A’ be the result of a sequence of elementary row 
operations on A. Prove that the rows of A span the same space as the rows of A’. 


3.5. Let V = F” be the space of column vectors. Prove that every subspace W of V is the 
space of solutions of some system of homogeneous linear equations AX = 0. 
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3.6. 


3.7. 


3.8. 
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Find a basis of the space of solutions in R” of the equation 


X1 +2x%.+3x34+---+nx, =0. 
Let (X,..., Xm) and (Yj, ..., Y,) be bases for R” and R”, respectively. Do the mn 
matrices Xj; Y; form a basis for the vector space R””” of all m Xn matrices? 


Prove that a set (v4, ..., U,) of vectors in F” is a basis if and only if the matrix obtained 
by assembling the coordinate vectors of v; is invertible. 


Section4 Computing with Bases 


4.1. 


4.2. 


4.3, 


4.4, 


45. 


(a) Prove that the set B = ((1, 2, 0)', (2, 1, 2)', G, 1, 1)') is a basis of R?. 

(b) Find the coordinate vector of the vector v = (1, 2, 3)' with respect to this basis. 

(c) Let B’ = ((0; 1, 0)', (1, 0, 1), (2, 1, 0)'). Determine the basechange matrix P from B 
to B’. 

(a) Determine the basechange matrix in R2, when the old basis is the standard basis 
E = (e}, @2) and the new basis is B = (e1 + €2, €1 — €2). 

(b) Determine the basechange matrix in R”, when the old basis is the standard basis E 
and the new basis is B = (€n, nj, ---, €1)- 


(c) Let B be the basis of R2 in which U1, = e; and v> is a vector of unit length making an 


angle of 120° with v;. Determine the basechange matrix that relates E to B. 
Let B = (v1, ..., Un) be a basis of a vector space V. Prove that one can get from B to any 
other basis B’ by a finite sequence of steps of the following types: 
(i) Replace v; by v; + av;, i# j, for some ain F, 
(ii) Replace v; by cv; for some c# 0, 
(iii) Interchange v; and vj. 


Let F, be a prime field, and let V = FZ. Prove: 


(a) The number of bases of V is equal to the order of the general linear group GL2(Fp). 


(b) The order of the general linear group GL2(F,) is p(p + 1)(p — 1)2, and the order of 
the special linear group SL2(Fp) is p(p + 1)(p — 1). 


How many subspaces of each dimension are therein (a) F3, (b) F,? 


Section 5 Direct Sums 


§.1. 


5.2. 


5.3. 


Prove that the space R”””" of all n Xn real matrices is the direct sum of the space of 
symmetric matrices (A' = A) and the space of skew-symmetric matrices (A‘ = — A). 


The trace of a square matrix is the sum ofits diagonal entries. Let W, be the space of n Xn 
matrices whose trace is zero. Find a subspace W? so that R?“" = W, ® Wp. 

Let W,,..., Wz be subspaces of a vector space V, such that V = )~ W;. Assume that 
Wn Wz = 0, (W1 + W2)N W3 =0, ... , (Wi + Wot->-+ We_1) 1 W, = 0. Prove that 
V is the direct sum of the subspaces W;, ..., Wy. 
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Section 6 Infinite-Dimensional Spaces 


6.1. 


6.2. 


*6.3. 


*6.4, 


Let E be the set of vectors (€), €2,...) in R®, and let w = (1,1,1,...). Describe the 
span of the set (w, é), €2,...). 


The doubly infinite row vectors (a) = (...,@-1,@o, @1,...), with a; real form a vector 
space. Prove that this space is isomorphic to R®. 


For every positive integer, we can define the space €? to be the space of sequences such 
that ©" |a;|? < oo. Prove that €? is a proper subspace of €?*1, 


Let V be a vector space that is spanned by a countably infinite set. Prove that every 
independent subset of V is finite or countably infinite. 


Miscellaneous Problems 


M.1. 


M.2. 


M.3. 


*M.4. 


*M.5. 


M.6. 


Consider the determinant function det: F°? > F, where F = F, is the prime field of 
order p and F? is the space of 22 matrices. Show that this map is surjective, that all 
nonzero values of the determinant are taken on the same number of times, but that there 
are more matrices with determinant 0 than with determinant 1. 


Let A be a real n Xn matrix. Prove that there is an integer N such that A satisfies a 
nontrivial polynomial relation AN + cy_; AN} + --- + cyA +9 = 0. 

(polynomial paths) (a) Let x(t) and y(t) be quadratic polynomials with real coefficients. 
Prove that the image of the path (x(t), y(4)) is contained in a conic, i.e., that there is a real 
quadratic polynomial f(x, y) such that f(x(¢), y(4) is identically zero. 

(b) Let x(t) = 1? — 1 and y(t) = 8 — ¢. Find a nonzero real polynomial f(x, y) such that 
S(x(2), y(t)) is identically zero. Sketch the locus { f(x, y) = 0} and the path (x(#), y(t)) 
in R¢. 

(c) Prove that every pair x(t), y(f) of real polynomials satisfies some real polynomial 
relation f(x, y) =0. 

Let V be a vector space over an infinite field F. Prove that V is not the union of finitely 
many proper subspaces. 


Let @ be the real cube root of 2. 


(a) Prove that (1, @, a) is an independent set over Q, 1.e., that there is no relation of the 
form a+ ba +ca* = O with integers a, b, c. 

Hint: Divide x° — 2 by cx* + bx +a. 

(b) Prove that the real numbers a + ba + ca’ with a, b, cin Q form a field. 


(Tabasco sauce: a mathematical diversion) My cousin Phil collects hot sauce. He has about 
a hundred different bottles on the shelf, and many of them, Tabasco for instance, have only 
three ingredients other than water: chilis, vinegar, and salt. What is the smallest number 
of bottles of hot sauce that Phil would need to keep on hand so that he could obtain any 
recipe that uses only these three ingredients by mixing the ones he had? 


CHAPTER 4 


Linear Operators 


That confusions of thought and errors of reasoning 
stilldarken the beginnings of Algebra, 
is the earnest and just complaint of sober and thoughtful men. 


—Sir William Rowan Hamilton 


4.1 THE DIMENSION FORMULA 
A linear transformation T:V — W from one vector space over a field F to another is a 
map that is compatible with addition and scalar multiplication: 


(4.1.1) T(v, + v2) = T(vy) + T(v2) sand) =T(cev,) = cT(v), 


for all vy, and v2 in V and all c in F. This is analogous to a homomorphism of groups, and 
calling it a homomorphism would be appropriate too. A linear transformation is compatible 
with arbitrary linear combinations: 


(4.1.2) T( Ss vici) = > T(vj) ci. 


Left multiplication by an m Xn matrix A with entries in F, the map 


(4.1.3) F" +, F™ thatsends X~AX 


is a linear transformation. Indeed, A(X; + X2) = AX, +AX2,andA(cX) =cAX. 

If B = (v1, ..., Un) is a subset of a vector space V over the field F, the map F” > V 
that sends X ~~ BX is a linear transformation. 

Another example: Let P, be the vector space of real polynomial functions 


(4.1.4) Ant” + ant") 4.--+ ait +a 


of degree at most n. The derivative 4 defines a linear transformation from Py, to P,-1. 
There are two important subspaces associated with a linear transformation: its kernel 

and its image: 

ker T = kernelofT ={ve V|T(v) =0}, 


(4.1.5) imT = imageofT ={we W|w=T(v) forsome ve V}. 
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The kernel is often called the nullspace of the linear transformation. As one may guess from 
the analogy with group homomorphisms, the kernel is a subspace of V and the image is a 
subspace of W. 

The main result of this section is the next theorem. 


Theorem 4.1.6 Dimension Formula. Let 7: V — W be a linear transformation. Then 
dim(ker 7) + dim(im 7) = dim V. 


The nullity and the rank of a linear transformation T are the dimensions of the kernel 
and the image, respectively, and the nullity and rank of a matrix A are defined analogously. 
With this terminology, (4.1.6) becomes 


(4.1.7) nullity + rank = dimension of V. 


Proof of Theorem (4.1.6). We'll assume that V is finite-dimensional, say of dimension n. Let 
k be the dimension of ker 7, and let (u,,..., ux) be a basis for the kernel. We extend this 
set to a basis of V: 


(4.1.8) (Uy, .-.,Uk3 Vig enw ase 
(see (3.4.15)). Fori = 1,...,n —k, let w; = T(v;). If we prove that C = (wy, ..., Wy-x) is 


a basis for the image, it will follow that the image has dimension n — k, and this will prove 
the theorem. 


We must show that C spans the image and that it is an independent set. Let w be an 
element of the image. Then w = 7(v) for some vin V. We write v in terms of the basis: 


V= ayy +--+ aug t byvy +--+ + by-kUn-k 
and apply 7, noting that T(u;) = 0: 
w = T(v) = byw, +--+ + On-KWn-k- 


Thus w is in the span of C. 
Next, we show that C is independent. Suppose we have a linear relation 


(4.1.9) CyWy +--+ + Cn-ZWy-K = 0. 
Let v = C11 + +++ + Cy-kUn—x, Where v; are the vectors in (4.1.8). Then 
T(v) =C\W{ Sie wa Cn-kWn-k = 0, 


so visin the nullspace. We write vin terms of the basis (uw, ..., u,) of the nullspace, say 
V=a,u, +--+ az,uz. Then 


-A{Wy — ++ — Agu tH CyVy +++ + Cy-RUn-k = —UtV=O., 


But the basis (4.1.8) is independent. So -a; = 0,...,-ay = 0, and cy = 0,..., en-~ = 0. 
The relation (4.1.9) was trivial. Therefore C is independent. Oo 
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When T is left multiplication by a matrix A (4.1.3), the kernel of 7, the nullspace of A, 
is the set of solutions of the homogeneous equation A X = 0. The image of T is the column 
space, the space spanned by the columns of A, which is also the set of vectors Bin F” such 
that the linear equation AX = B has a solution (3.4.6). 

It is a familiar fact that by adding the solutions of the homogeneous equation AX = Oto 
a particular solution X‘9 of the inhomogeneous equation AX = B, one obtains all solutions of 
the inhomogeneous equation. Another way to say this is that the set of solutions of AX = B 
is the additive coset Xo + N of the nullspace N in F”. 

An nXn matrix A whose determinant isn’t zero is invertible, and the system of 
equations AX = B has a unique solution for every B. In this case, the nullspace is {0}, and 
the column space is the whole space F”. On the other hand, if the determinant is zero, the 
nullspace N has positive dimension, and the image, the column space, has dimension less 
than n. Not all equations AX = B have solutions, but those that do have a solution have 
more than one solution, because the set of solutions is a coset of N. 


4.2. THE MATRIX OF A LINEAR TRANSFORMATION 


Every linear transformation fromone space of column vectors to another is left multiplication 
by a matrix. 


Lemma 4.2.1 Let 7: F” — F" be a linear transformation between spaces of column 


vectors, and let the coordinate vector of T(e;) be Aj = (a1;,..., am j)'. Let A be the mxn 
matrix whose columns are Aj, ..., An. Then T acts on vectors in F” as multiplication by A. 
Proof. T(X) = TS) ejxj)= a Tl ej) x; = D4 Aj xj = AX, D 


For example, let c = cos 6, s = sin 9. Counterclockwise rotation p: R* — R? of the 
plane through the angle 9 about the origin is a linear transformation. Its matrix is 


(4.2.2) R= E - | 


Let’s verify that multiplication by this matrix rotates the plane. We write a vector X in the 
form r(cosq@, sina)', where r is the length of X. Let c’ = cosa@ and s’ = sina. The addition 
formulas for cosine and sine show that 


_ fe -s{}e'] | ee-ss’| | cos(0+a) 
oe r[é ‘] | ~ ares =r| sin(O +a) |” 
So RX is obtained from X by rotating through the angle 9, as claimed. 
One can make a computation analogous to that of Lemma 4.2.1 with any linear 


transformation 7: V — W, once bases of the two spaces are chosen. If B = (14, ..., Un) iS 
a basis of V, we use the shorthand notation 7(B) to denote the hypervector 


(4.2.3) T(B) = (T(1,),..., Tn). 
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If v= BX = vyxy +--+ + UnXn, then 


(4.2.4) T(v) = Tv) x1 + -+- + Tn) Xn = TCB)X. 
Proposition 4.2.5 Let 7: V — W bea linear transformation, and let B = (v1, ..., v,) and 
C = (uj, ..., Wm) be bases of V and W, respectively. Let X be the coordinate vector of an 


arbitrary vector v with respect to the basis B and let Y be the coordinate vector of its image 
T(v). So v = BX and T(v) = CY. There is an m Xn matrix A with the dual properties 


(4.2.6) T(B)=CA and AX=Y. 


This matrix A is the matrix of the transformation T with respect to the two bases. Either of 
the properties (4.2.6) characterizes the matrix. 


Proof. We write T(v;) as a linear combination of the basis C, say 


(4.2.7) T(vj) = wyayj +++ + Wmamj, 

and we assemble the coefficients a;; into a column vector A j = (Qyj,.--,Am ps so that 
T(v;) = CA;. Then if A is the matrix whose columns are Aj,..., An, 

(4.2.8) T(B) = (Tv), --+> LCun)) = (w, sess Wm) A | = CA, 


as claimed. Next, if v = BX, then 
T(v) = T(B)X = CAX. 


Therefore the coordinate vector of J(v), which we named Y, is equal to AX. O 


The isomorphisms F” — V and F” — W determined by the two bases (3.5.3) help to 
explain the relationship between T and A. If we use those isomorphisms to identify V and 
W with F” and F”, then T corresponds to multiplication by A, as shown in the diagram 
below: 


(4.2.9) pn As pm X ~——~~~—> AX 


dee 


Ww BX ~~> T(B)X = CAX 


Going from F” to W along the two paths gives the same answer. A diagram that has this 
property is said to be commutative. All diagrams in this book are commutative. 

Thus any linear transformation between finite-dimensional vector spaces V and W 
corresponds to matrix multiplication, once bases for the two spaces are chosen. This is a nice 
result, but if we change bases we can do much better. 
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Theorem 4.2.10 


(a) Vector space form: Let T: V > W bea linear transformation between finite-dimensional 
vector spaces. There are bases B and C of V and W, respectively, such that the matrix 
of T with respect to these bases has the form 


A'= , 


where J; is the rXr identity matrix and ris the rank of T. 
(b) Matrix form: Given an m Xn matrix A, there are invertible matrices Q and P such that 
A’ = Q"!AP has the form shown above. 


(4.2.11) 


Proof. (a) Let (uj,..., ux) be a basis for the kernel of 7. We extend this set to a basis B 
of V, listing the additional vectors first, say (v1, ..., Ur} 41,..., Ux), wherer + k =n. Let 
w; = T(v;). Then, as in the proof of (4.1.6), one sees that (w1,..., w,) is a basis for the 
image of T. We extend this set to a basis C of W, say (w1,..., W73 Z1,.--, Zs), listing the 
additional vectors last. The matrix of T with respect to these bases has the form (4.2.11). 

Part (b) of the theorem can be proved using row and column operations. The proof is 
Exercise 2.4. a) 

This theorem is a prototype for a number of results that are to come. It shows the 
advantage of working in vector spaces without fixed bases (or coordinates), because the 
structure of an arbitrary linear transformation is described by the very simple matrix (4.2.11). 
But why are (a) and (b) considered two versions of the same theorem? To answer this, we 
need to analyze the way that the matrix of a linear transformation changes when we make 
other choices of bases. 

Let A be the matrix of T with respect to bases B and C of V and W, as in (4.2.6), and 
let B’ = (v},..., v;,) and C’ = (w,..., Wi,) be new bases for V and W. We can relate the 
new basis B’ to the old basis B by an invertible n Xn matrix P, as in (3.5.11). Similarly, C’ is 
related to C by an invertible 7 Xm matrix Q. These matrices have the properties 


(4.2.12) B'=BP, PX’=X and C=CQ, @Y'=Y. 


Proposition 4.2.13 Let A be the matrix of a linear transformation T with respect to given 
bases B and C. 


(a) Suppose that new bases B’ and C’ are related to the given bases by the matrices P and 
Q, as above. The matrix of T with respect to the new bases is A’ = O7AP. 

(b) The matrices A’ that represent T with respect to other bases are those of the form 
A' = Q”!AP, where Q and P can be any invertible matrices of the appropriate sizes. 
Proof. (a) We substitute X = PX’ and Y = QY’ into the equation Y = AX (4.2.6), obtaining 
OY' = APX’. So Y' = (Q7!AP)X’". Since A’ is the matrix such that A’X’ = Y’, this shows 
that A’ = Q7'AP. Part (b) follows because the basechange matrices can be any invertible 
matrices (3.5.9). O 
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It folfows from the proposition that the two parts of the theorem amount to the same 
thing. To derive (a) from (b), we suppose given the linear transformation T, and we begin 
with arbitrary choices of bases for V and W, obtaining a matrix A. Part (b) tells us that there 
are invertible matrices P and Q such that A’ = Q7!AP has the form (4.2.11). When we use 
these matrices to change bases in V and W, the matrix A is changed to A’. 

To derive (b) from (a), we view an arbitrary matrix A as the matrix of the linear 
transformation “‘left multiplication by A’’ on column vectors. Then A is the matrix of T with 
respect to the standard bases of F” and F”, and (a) guarantees the existence of P, Q so that 
Q”!AP has the form (4.2.11). 

We also learn something remarkable about matrix multiplication here, because left 
multiplication by a matrix is a linear transformation. Left multiplication by an arbitrary 
matrix A is the same as left multiplication by a matrix of the form (4.2.11), but with reference 
to different coordinates. 

In the future, we will often state a result in two equivalent ways, a vector space form 
and a matrix form, without stopping to show that the two forms are equivalent. Then we will 
present whichever proof seems simpler to write down. 


We can use Theorem 4.2.10 to derive another interesting property of matrix mul- 
tiplication. Let NM and U denote the nullspace and column space of the transformation 
A: F" + F™ So N isa subspace of F” and U is a subspace of F’”. Let k and r denote the 
dimensions of N and U. So k is the nullity of A and r is its rank. 

Left multiplication by the transpose matrix A‘ defines a transformation At: F’” — F” 
in the opposite direction, and therefore two more subspaces, the nullspace Nj and the 
column space U; of A‘. Here Uj is a subspace of F”, and Nj is a subspace of F”. Let 
ky and r; denote the dimensions of N; and Uj, respectively. Theorem 4.1.6 tells us that 
k+r=n, and also that k, +r; = m. Theorem 4.2.14 below gives one more relation among 
these integers: 


Theorem 4.2.14 With the above notation, r; = r: The rank of a matrix is equal to the rank 
of its transpose. 


Proof, Let P and Q be invertible matrices such that A’ = Q7!AP has the form (4.2.11). 
We begin by noting that the assertion is obvious for the matrix A’. Next, we examine the 
diagrams 


ft 
(4.2.15) pn 4 pm ee 
fof A 
F? AL Fm Fn eh. Fm 


The vertical arrows are bijective maps. Therefore, in the left-hand diagram, Q carries the 
column space of A’ (the image of multiplication by A’) bijectively to the column space of A. 
The dimensions of these two column spaces, the ranks of A and A’, are equal. Similarly, the 
ranks of A! and A” are equal. So to prove the theorem, we may replace the matrix A by A’. 
This reduces the proof to the trivial case of the matrix (4.2.11). O 
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We can reinterpret the rank r; of the transpose matrix A‘. By definition, it is the 
dimension of the space spanned by the columns of A’, and this can equally well be thought 
of as the dimension of the space of row vectors spanned by the rows of A. Because of this, 
people often refer to 7; as the row rank of A, and tor as the column rank. 

The row rank is the maximal number of independent rows of the matrix, and the 
column rank is the maximal number of independent columns. Theorem 4.2.14 can be stated 
this way: 


Corollary 4.2.16 The row rank and the column rank of an m Xn matrix A are equal. O 


4.3. LINEAR OPERATORS 


In this section, we study linear transformations 7: V — V that map a vector space to itself. 
They are called linear operators. Left multiplication by a (square) m Xn matrix with entries 
in a field F defines a linear operator on the space F” of column vectors. 

For example, let c = cos@ and s = sin@. The rotation matrix (4.2.2) 


c -s 
Fes 
is a linear operator on the plane R?. 
The dimension formula dim(ker 7) + dim(im 7) = dim V is valid for linear operators. 
But here, since the domain and range are equal, we have extra information that can be 
combined with the formula. Both the kernel and the image of T are subspaces of V. 


Proposition 4.3.1 Let K and W denote the kernel and image, respectively, of a linear 
operator 7 on a finite-dimensional vector space V. 
(a) The following conditions are equivalent: 
¢ T is bijective, 
° K= {0}, 
eWe=V. 
(b) The following conditions are equivalent: 
e Vis the direct sum K ® W, 
« KN W = {0}, 
© K+We=V. 


Proof. (a) T is bijective if and only if the kernel K is zero and the image W is the whole 
space V. If the kernel is zero, the dimension formula tells us that dim W = dim V, and 
therefore W = V. Similarly, if W = V, the dimension formula shows that dim K = 0, and 
therefore K = 0. In both cases, T is bijective. 


(b) V is the direct sum K ® W if and only if both of the conditions K 1 W = {0} and 
K+W = V hold. If KM W = {0}, then K and W are independent, so the sum U = K+ W 
is the direct sum K ® W, and dim U = dim K + dim W (3.6.6)(a). The dimension formula 
shows that dim U = dim V, so U = V, and this shows that K ®@W=V.If K+ W = V, 
the dimension formula and Proposition 3.6.6(a) show that K and W are independent, and 
again, V is the direct sum. O 


Section 4.3 Linear Operators 109 


e A linear operator that satisfies the conditions (4.3.1)(a) is called an invertible operator. 
Its inverse function is also a linear operator. An operator that is not invertible is a singular 
operator. 


The conditions of Proposition 4.3.1(a) are not equivalent when the dimension of V 
is infinite. For example, let V = R© be the space of infinite row vectors (a), a2, ...) (see 
Section 3.7). The kernel of the right shift operator St, defined by 


(4.3.2) S* (a1, a2, ...) = (0,a;,a....), 


is the zero space, and its image is a proper subspace of V. The kernel of the Jeft shift operator 
S~, defined by 
S” (ay, Q2, 43, ...) = (€, @,...), 


is a proper subspace of V, and its image is the whole space. 


The discussion of bases in the previous section must be changed slightly when we are 
dealing with linear operators. We should pick only one basis B for V, and use it in place of 
both of the bases B and C in (4.2.6). In other words, to define the matrix A of T with respect 
to the basis B, we should write 


(4.3.3) T(B) =BA, and AX =Y asbefore. 


As with any linear transformation (4.2.7), the columns of A are the coordinate vectors of the 
images 7(v;) of the basis vectors: 


(4.3.4) T(vj) = vyayj +--+ + UnGnj. 


A linear operator is invertible if and only if its matrix with respect to an arbitrary basis is an 
invertible matrix. 

When one speaks of the the matrix of a linear operator on the space F”, it is assumed 
that the basis is the standard basis E, unless a different basis is specified. The operator is then 
multiplication by that matrix. 


A new feature arises when we study the effect of a change of basis. Suppose that B is 
replaced by a new basis B’, 


Proposition 4.3.5 Let A be the matrix of a linear operator T with respect to a basis B. 


(a) Suppose that a new basis B’ is described by B’ = BP. The matrix that represents T with 
respect to this basis is A’ = P™! AP. 


(b) The matrices A’ that represent the operator T for different bases are the matrices of the 
form A’ = P™1AP, where P can be any invertible matrix. O 


In other words, the matrix changes by conjugation. This is a confusing fact to grasp. 
So, though it follows from (4.2.13), we will rederive it. Since B’ = BP and since 7(B) = BA, 
we have 
T(B’) = T(B)P = BAP. 


We are not done. The formula we have obtained expresses 7(B’) in terms of the old basis B. 
To obtain the new matrix, we must write 7(B’) in terms of the new basis B’. So we substitute 
B = B’P"! into the equation. Doing so gives us 7(B’) = B’P"!AP. O 
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In general, we say that a square matrix A is similar to another matrix A’ if A’ = P-!AP 
for some invertible matrix P. Such a matrix A’ is obtained from A by conjugating by P™!, 
and since P can be any invertible matrix, P! is also arbitrary. It would be correct to use the 
term conjugate in place of similar. 

Now if we are given the matrix A, it is natural to look for a similar matrix A’ that 
is particularly simple. One would like to get a result somewhat like Theorem 4.2.10. But 
here our allowable change is much more restricted, because we have only one basis, and 
therefore one matrix P, to work with. Having domain and range of a linear transformation 
equal, which seems at first to be a simplification, actually makes things more difficult. 

We can get some insight into the problem by writing the hypothetical basechange 
matrix as a product of elementary matrices, say P = EF, --- Ey. Then 


P‘AP = E;!..-Ej'AE,--: Er. 


In terms of elementary operations, we are allowed to change A by a sequence of steps 
A~» EAE. In other words, we may perform an arbitrary column operation E on A, 
but we must also make the row operation that corresponds to the inverse matrix E!, 
Unfortunately, these row and column operations interact, and analyzing them becomes 
confusing. 


4.4 EIGENVECTORS 


The main tools for analyzing a linear operator T: V — V are invariant subspaces and 
eigenvectors. 


e A subspace W of V is invariant, or more precisely T-invariant, if it is carried to itself by 
the operator: 


(4.4.1) TW CW. 


In other words, W is invariant if, whenever w is in W, T(w) is also in W. When this is so, T 
defines a linear operator on W, called its restriction to W. We often denote this restriction 
by Tlw. 

If W is a T-invariant subspace, we may form a basis B of V by appending vectors to a 
basis (w1,..., wx) of W, say 


(4.4.2) B = (Wj, ..., Wei U1, «++ Un—-k)- 


Then the fact that W is invariant is reflected in the matrix of 7. The columns of this matrix, 
we'll call it M, are the coordinate vectors of the image vectors (see (4.3.3)). But T(wj) is 


in the subspace W, so it is a linear combination of the basis (w1,..., wx). When we write 
T(wj;) in terms of the basis B, the coefficients of the vectors 11, ..., Up—x will be zero. It 
follows that M will have the block form 

A B 
(4.4.3) M= & Al : 


where A is ak Xk matrix, the matrix of the restriction of T to W. 
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If V happens to be the direct sum W, ® W2 of two T-invariant subspaces, and if we 
make a basis B = (Bj, Bz) of V by appending bases of W; and W), the matrix of T will have 
the block diagonal form 


A, 0 
(4.4.4) M= . ace 


where A; is the matrix of the restriction of T to W;. 
The concept of an eigenvector is closely related to that of an invariant subspace. 


e Aneigenvector v of a linear operator T is a nonzero vector such that 
(4.4.5) T(v) =Av 


for some scalar A, i.e., some element of F. A nonzero column vector is an eigenvector of a 
square matrix A if it is an eigenvector for the operation of left multiplication by A. 


The scalar A that appears in (4.4.5) is called the eigenvalue associated to the eigenvector 
v. When we speak of an eigenvalue of a linear operator T or of a matrix A without specifying 
an eigenvector, we mean a scalar A that is the eigenvalue associated to some eigenvector. 
An eigenvalue may be any element of F, including zero, but an eigenvector is not allowed 
to be zero. Eigenvalues are often denoted, as here, by the Greek letter A (lambda).! 

An eigenvector with eigenvalue 1 is a fixed vector: T(v) = v. An eigenvector with 
eigenvalue zero is in the nullspace: 7(v) = 0. When V = R”, a nonzero vector v is an 
eigenvector if v and T(v) are parallel. 

If v is an eigenvector of a linear operator 7, with eigenvalue A, the subspace W 
spanned by v will be 7-invariant, because 7(cv) = cA vis in W for all scalars c. Conversely, 
if the one-dimensional subspace spanned by v is invariant, then v is an eigenvector. So an 
eigenvector can be described as a basis of a one-dimensional invariant subspace. 

It is easy to tell whether or not a given vector X is an eigenvector of a matrix A. We 
simply check whether or not AX is a multiple of X. And, if A is the matrix of T with respect 
to a basis B, and if X is the coordinate vector of a vector v, then X is an eigenvector of A if 
and only if v is an eigenvector for T. 

The standard basis vector e,; = (1, 0)* is an eigenvector, with eigenvalue 3, of the 


matrix 
3 1 
0 21° 


The vector (1,-1)' is another eigenvector, with eigenvalue 2. The vector (0,1,1)! is an 
eigenvector, with eigenvalue 2, of the matrix 


1 1 -1 
A={2 1 1 
3 0 2 


1The German word “eigen” means roughly “characteristic.” Eigenvectors and eigenvalues are sometimes called 
char acteristic vectors. 
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If (14, ..., Un) is a basis of V and if v, is an eigenvector of a linear operator 7, the 
matrix of T will have the block form 


(4.46) E 3 | i 


where A is the eigenvalue of v,. This is the block form (4.4.3) in the case of an invariant 
subspace of dimension 1. 


Proposition 4.4.7 Similar matrices (A’ = P"'!AP) have the same eigenvalues. 
This is true because similar matrices represent the same linear transformation. oO 


Proposition 4.4.8 


(a) Let T be a linear operator on a vector space V. The matrix of T with respect to a basis 
B = (vj, ..., Up) is diagonal if and only if each of the basis vectors v; is an eigenvector. 

(b) Ann Xn matrix A is similar to a diagonal matrix if and only if there is a basis of F” that 
consists of eigenvectors. 


This follows from the definition of the matrix A (see (4.3.4)). If T(v;) = A;v;, then 


At 
(4.4.9) T(B) = (v1Aq,...UnAn) = (Vy, «+5 Un) 
as O 


This proposition shows that we can represent a linear operator simply by a diagonal 
matrix, provided that it has enough eigenvectors. We will see in Section 4.5 that every linear 
operator on a complex vector space has at least one eigenvector, and in Section 4.6 that 
in most cases there is a basis of eigenvectors. But a linear operator on a real vector space 
needn’t have any eigenvector. For example, a rotation of the plane through an angle 0 
doesn’t carry any vector to a parallel one unless @ is 0 or z. The rotation matrix (4.2.2) with 
60, a has no real eigenvector. 


e A general example of a real matrix that has at least one real eigenvalue is one all of whose 
entries are positive. Such matrices, called positive matrices, occur often in applications, 
and one of their most important properties is that they always have an eigenvector whose 
coordinates are positive (a positive eigenvector). 


Instead of proving this fact, we’ll illustrate it by examining the effect of multiplication 
by a positive 22 matrix A on R*. Let w; = Ae be the columns of A. The parallelogram 
law for vector addition shows that A sends the first quadrant S to the sector bounded by the 
vectors w 1 and wy. The coordinate vector of w; is the ith column of A. Since the entries of 
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A are positive, the vectors w; lie in the first quadrant. So A carries the first quadrant to itself: 
S > AS. Applying A to this inclusion, we find AS > A*S, and so on: 


(4.4.10) SD ASDA SDASD..%5 


3 2 
1 4) 

Now, the intersection of a nested set of sectors is either a sector or a half-line. In our 
case, the intersection Z = (\A’S turns out to be a half-line. This is intuitively plausible, 
and it can be shown in various ways, but we’ll omit the proof. We multiply the relation 
Z = [A'S on both sides by A: 


as is illustrated below for the matrix A = 


oO co 
AZ=A|[)A’S| =[] A'S = Z. 
0 1 


Hence Z = AZ. Therefore the nonzero vectors in Z are eigenvectors. 


eo St ee 


(4.4.11) Images of the First Quadrant Under Repeated Multiplication by 
a Positive Matrix. 


4.5 THE CHARACTERISTIC POLYNOMIAL 


In this section we determine the eigenvectors of an arbitrary linear operator. We recall that 
an eigenvector of a linear operator T is a nonzero vector v such that 


(4.5.1) T(v) = Av, 


for some A in F. If we don’t know A, it can be difficult to find the eigenvector directly when 
the matrix of the operator is complicated. The trick is to solve a different problem, namely 
to determine the eigenvalues first. Once an eigenvalue A is determined, equation (4.5.1) 
becomes linear in the coordinates of v, and solving it presents no problem. 

We begin by writing (4.5.1) in the form 


(4.5.2) [AZ — T] (v) = 0, 


114 Chapter 4 Linear Operators 


where / stands for the identity operator and AJ — T is the linear operator defined by 
(4.5.3) (Al —T](v) =Av—-TQ). 
It is easy to check that AJ — T is indeed a linear operator. We can restate (4.5.2) as follows: 


A nonzero vector v is an eigenvector with eigenvalue A 


(4.5.4) if and only if it is in the kernel of AJ — T. 


Corollary 4.5.5 Let T be a linear operator on a finite-dimensional vector space V. 

(a) The eigenvalues of T are the scalars A in F such that the operator AJ — T is singular, 
i.e., its nullspace is not zero. 

(b) The following conditions are equivalent: 


e T is a singular operator. 
e T has an eigenvalue equal to zero. 
¢ If A is the matrix of T with respect to an arbitrary basis, then det A = 0. O 


If A is the matrix of T with respect to some basis, then the matrix of AJ — Tis AI —A. 
So Af — T is singular if and only if det(AJ — A) = 0. This determinant can be computed 
with indeterminate A, and doing so provides us, at least in principle, with a method for 
determining the eigenvalues and eigenvectors. 


Suppose for example that A is the matrix E i] whose action on R? is illustrated in 
Figure (4.4.11). Then 


u-a=[*? ol 


-1 A-4 


and 
det (AI— A) = A*-7A410 = (A—5)(A—2). 


The determinant vanishes when A = 5 or 2,so the eigenvalues of A are 5 and 2. To find the 
eigenvectors, we solve the two systems of equations [SJ — A]X¥ = 0 and [2/ — A]X = 0. The 
solutions are determined up to scalar factor: 


(4.5.6) n=[t]. m=(j]. 


We now consider the same computation for an indeterminate matrix of arbitrary size. 
It is customary to replace the symbol A by a variable t. We form the matrix t] — A: 


(tay) -ay2,— ++ - Gin 
(45.7) eee: -ay = (t-a22) +++ -Am 
-An) Gi alan (t-Gnn) 


The complete expansion of the determinant [Chapter 1 (1.6.4)] shows that det (¢/ — A) isa 
polynomial of degree n in t whose coefficients are scalars, elements of F’. 
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Definition 4.5.8 The characteristic polynomial of a linear operator T is the polynomial 
p(t) = det (t/ — A), 
where A is the matrix of T with respect to some basis. 
The eigenvalues of JT are determined by combining (4.5.5) and (4.5.8): 


Corollary 4.5.9 The eigenvalues of a linear operator are the roots of its characteristic 
polynomial. O 


Corollary 4.5.10 Let A be an upper or lower triangular m Xn matrix with diagonal entries 
Qit,..-,@nn. The characteristic polynomial of A is (t — a11)---(t — Qnn). The diagonal 
entries of A are its eigenvalues. 


Proof. If A is upper triangular, so is tJ — A, and the diagonal entries of tJ — A are t — ajj. 
The determinant of a triangular matrix is the product of its diagonal entries. O 


Proposition 4.5.11 The characteristic polynomial of an operator T does not depend on the 
choice of a basis. 


Proof. A second basis leads to a matrix A’ = P"'AP (4.3.5), and 
tl- A’ =tl-P'AP=P'\(tl—A)P. Then 
det (ti — A’) = det P"'det (t] — A)det P = det (t/ — A). Oo 


The characteristic polynomial of the 22 matrix A = l¢ A is 


-b 


(4.5.12) p(t) = det (tI — A) = det & aa 


= f° — (trace A)t + (det A), 


where trate A=a+4+d. 

An incomplete description of the characteristic polynomial of an n Xn” matrix is 
given by the next proposition, which is proved by computation. It wouldn’t be very 
difficult to determine the remaining coefficients, but explicit formulas for them aren’t often 
used. 


Proposition 4.5.13 | The characteristic polynomial of ann Xn matrix A has the form 
p(t) =f” — (trace A)r"”! + (intermediate terms) + (-1)"(det A), 
where trace A, the trace of A,is the sum of its diagonal entries: 
trace A = @]1 +022 +--+ +4nn. ie| 


Proposition 4.5.11 shows that all coefficients of the characteristic polynomial are 
independent of the basis. For instance, trace(P"'AP) = trace A. 
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Since the characteristic polynomial, the trace, and the determinant are independent of 
the basis, they depend only on the operator T. So we may define the terms characteristic 
polynomial, trace, and determinant of a linear operator T. They are the ones obtained using 
the matrix of T with respect to any basis. 


Proposition 4.5.14 Let 7 be a linear operator on a finite-dimensional vector space V. 


(a) If V has dimension n, then 7 has at most n eigenvalues. 
(b) If Fis the field of complex numbers and V #{0}, then 7 has at least one eigenvalue, and 
hence at least one eigenvector. 


Proof. (a) The eigenvalues are the roots of the characteristic polynomial, which has degree 
n. A polynomial of degree n can have at most n roots. This is true for a polynomial with 
coefficients in any field F' (see (12.2.20)). 


(b) The Fundamental Theorem of Algebra asserts that every polynomial of positive degree 
with complex coefficients has at least one complex root. There is a proof of this theorem in 
Chapter 15 (15.10.1). | 


For example, let Rg be matrix (4.2.2) that represents the counterclockwise rotation of 
R2 through an angle @. Its characteristic polynomial, p(t) = t” — (2cos@)t +1, has no real 
root provided that 00, z, so no real eigenvalue. We have observed this before. But the 
operator on C? defined by Reg does have the complex eigenvalues e*’9. 


Note: When we speak of the roots of a polynomial p(t) or the eigenvalues of a matrix or 
linear operator, repetitions corresponding to multiple roots are supposed to be included. 
This terminology is convenient, though imprecise. Oo 


Corollary 4.5.15 IfA,,...,A, are the eigenvalues of ann Xn complex matrix A, then det A 
is the product A; ---A,, and trace A is the sum A} +...+Ap. 


Proof. Let p(t) be the characteristic polynomial of A. Then 


(t-A1):+-(t-An) = pW) = t? - (trace A)t?~1 4+... + (det A). | 


4.6 TRIANGULAR AND DIAGONAL FORMS 


In this section we show that for “‘most’’ linear operators on a complex vector space, there is 
a basis such that the matrix of the operator is diagonal. The key fact, which was noted at the 
end of Section 4.5, is that every complex polynomial of positive degree has a root. This tells 
us that every linear operator has at least one eigenvector. 


Proposition 4.6.1 


(a) Vector space form: Let T be a linear operator on a finite-dimensional complex vector 
space V. There is a basis B of V such that the matrix of T with respect to that basis is 
upper triangular. 

(b) Matrix form: Every complex n Xn matrix A is similar to an upper triangular matrix: 
There is a matrix P € GL»(C) such that P-!AP is upper triangular. 
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Proof. The two assertions are equivalent, because of (4.3.5). We will work with the matrix. 
Let V = C”. Proposition 4.5.14(b) shows that V contains an eigenvector of A, call it vy. 
Let 2 be its eigenvalue. We extend (v) to a basis B = (v;,..., vn) for V. The new matrix 
A’ = P"!AP has the block form 


, _ {A|* 
(4.6.2) A= ol 


where D is an (nm — 1) X(n — 1) matrix (see (4.4.6)). By induction on n, we may assume that 
the existence of a matrix Q € GL,,-;(C) such that O7! DQ is upper triangular will have been 


proved. Let 
= 1/0 ee a zs: A * 
= foto} veh i = 2 Oe a oane 


is upper triangular, and A” = (PQ,)7!A(PQ)). O 


Corollary 4.6.3 Proposition 4.6.1 continues to hold when the phrase ‘upper triangular”’ is 
replaced by ‘‘lower triangular.” 


The lower triangular form is obtained by listing the basis B of (4.6.1)(a) in reverse 
order. 


The important point for the proof of Proposition 4.6.1 is that every complex polynomial 
has a root. The same proof will work for any field F, provided that all the roots of the 
characteristic polynomial are in the field. 


Corollary 4.6.4 


(a) Vector space form: Let T be a linear operator on a finite-dimensional vector space V 
over a field F’, and suppose that the characteristic polynomial of T is a product of linear 
factors in the field F’. There is a basis B of V such that the matrix A of T is upper (or 
lower) triangular. 

(b) Matrix form: Let A be ann Xn matrix with entries in F’, whose characteristic polynomial 
is a product of linear factors. There is a matrix P € GL,(F) such that P-!AP is upper 
(or lower) triangular. 


The proof is the same, except that to make the induction step one has to check that the 
characteristic polynomial of the matrix D that appears in (4.6.2) is p(t)/(t — A), where p(?) 
is the characteristic polynomial of A. Then the hypothesis that the characteristic polynomial 
factors into linear factors carries over from A to D. O 


We now ask which matrices A are similar to diagonal matrices. They are called 
diagonalizable matrices. As we saw in (4.4.8) (b), they are the matrices that have bases 
of eigenvectors. Similarly, a linear operator that has a basis of eigenvectors is called a 
diagonalizable operator. The diagonal entries are determined, except for their order, by the 
linear operator T. They are the eigenvalues. 


Theorem 4.6.6 below gives a partial answer to our question; a more complete answer 
will be given in the next section. 
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Proposition 4.6.5 Let v1,...,v, be eigenvectors of a linear operator 7 with distinct 
eigenvalues Aj,..., A;. The set (v1, ..., U,) is independent. 


Proof. We use induction on r. The assertion is true when r = 1, because an eigenvector 
cannot be zero. Suppose that a dependence relation 


0 = ayv, +--+ a,v; 
is given. We must show that a; = 0 for all i. We apply the operator T: 
0 = T(0) =a, Tv) + +++ + a-T(ur) = ayAyuy +++ + GrAr vy. 


This is a second dependence relation among (v;,..., v;). We eliminate v,; from the two 
relations, multiplying the first relation by A, and subtracting the second: 


O=ay(A, — Aq)U, +--+ +ayp—1 Ar — Ar—1) Ur —1- 


Applying induction, we may assume that (v1, ..., vy—1) is an independent set. This tells us 
that the coefficients a;(A; — A;), i <r, are all zero. Since the A; are distinct, A; — A; is not 
zero if i <r. Thus ay = --- = a,_; = 0. The original relation reduces to 0 = a;v,. Since an 
eigenvector cannot be zero, @; is zero too. Oj 


The next theorem follows by combining (4.4.8) and (4.6.5): 


Theorem 4.6.6 Let T be a linear operator on a vector space V of dimension n over a field 
F. If its characteristic polynomial has n distinct roots in F, there is a basis for V with respect 
to which the matrix of T is diagonal. O 


Note: Diagonalization is a powerful tool. When one is presented with a diagonalizable 
operator, it should be an automatic response to work with a basis of eigenvectors. 


As an example of diagonalization, consider the real matrix 


(4.6.7) = E ‘| 


Its eigenvectors were computed in (4.5.6). These eigenvectors form a basis B = (vj, v2) of 
R*. According to (3.5.13), the matrix relating the standard basis E to this basis B is 


(4.6.8) P= (8) =| ml ae J | ana 


oy rari IE AE IE ale 


The next proposition is a variant of Proposition 4.4.8. We omit the proof. 
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Proposition 4.6.10 Let F be a field. 


(a) Let T be a linear operator on F”. If B = (v1, ..., Un) is a basis of eigenvectors of T, and 
if P = [B], then A = P-'AP = [BJ ‘A[B] is diagonal. 

(b) Let B = (11, ..., Un) be a basis of F”, and let A be the diagonal matrix with diagonal 
entries A,,...,A, that are not necessarily distinct. There is a unique matrix A such 
that, fori = 1,...,, vj is an eigenvector of A with eigenvalue A;, namely the matrix 


[B] A [B]?. Oo 
A nice way to write the equation [B] /A[B] = A is 
(4.6.11) A[B] = [BJ]A. 
One application of Theorem 4.6.6 is to compute the powers of a diagonalizable matrix. 


The next lemma needs to be pointed out, though it follows trivially when one expands the 
left sides of the equations and cancels PP™!. 


Lemma 4.6.12 Let A, B, and P be n Xn matrices. If P is invertible, then (P-!AP)(P-' BP) = 
P™!(AB)P, and for all k > 1, (P~™!AP)* = PAP, Oo 


Thus if A, P, and A are as in (4.6.9), then 


4171 2775 Jf. 27 1fsk42kt! 2.sk — okt 
k k : ed ras 
Be =i; Fall ‘4 E il=3[ sk _ 2k ee | 


If f(t) = ag + ayt+--++ yt" is a polynomial in ¢ with coefficients in F and if A is an 
n Xn matrix with entries in F, then f(A) will denote the matrix obtained by substituting A 
formally for t. 
(4.6.13) f(A) =aopl +ajA+---+anA". 
The constant term ao gets replaced by aol. Then if A = PAP . 


(4.6.14) f(A) = f(PAP"') = apl +a;PAP! 4+---+a,PA"P! = Pf(A)Pl. 


The analogous notation is used for linear operators: If T is a linear operator on a vector 
space over a field F,, the linear operator f(7) on V is defined to be 


(4.6.15) f(D =anl+aiT+---+a,T", 


where J denotes the identity operator. The operator f(7) acts on a vector by f(T)v = 
agu+a,Tut+---+a,T”v. (In order to avoid too many parentheses we have omitted some 
by writing Tv for T(v).) 
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4.7 JORDAN FORM 


Suppose we are given a linear operator T on a finite-dimensional complex vector space 
V. We have seen that, if the roots of its characteristic polynomial are distinct, there is 
a basis of eigenvectors, and that the matrix of T with respect to that basis is diago- 
nal. Here we ask what can be done without assuming that the eigenvalues are distinct. 
When the characteristic polynomial has multiple roots there will most often not be a 
basis of eigenvectors, but we’ll see that, nevertheless, the matrix can be made fairly 
simple. 

An eigenvector with eigenvalue A of a linear operator T is a nonzero vector v such 
that (T —A)v = 0. (We will write T — A for T — AJ here.) Since our operator T may not 
have enough eigenvectors, we work with generalized eigenvectors. 


e A generalized eigenvector with eigenvalue A of a linear operator T is a nonzero vector x 
such that (T — A)*x = 0 for some k > 0. Its exponent is the smallest integer d such that 
(T -A)?x = 0. 


Proposition 4.7.1 Let x be a generalized eigenvector of T, with eigenvalue A and exponent 
d, and for j > 0, letu; = (T —A)/x. Let B= (uo, ..., uUg_;), and let X = Span B. Then X 
is a T-invariant subspace, and B is a basis of X. 


We use the next lemma in the proof. 


Lemma 4.7.2) With u; as above, a linear combination y = cjuj +--+ + Cg_1Ug-1 with 
j < d-1and c; #0 isa generalized eigenvector, with eigenvalue A and exponent d - j. 


Proof. Since the exponent of x is d, (T — A)4~!x = ug_1 #0. Therefore (T — A)?-J-ly = 
cj Uq-_y isn’t Zero, but (T — 4)4-/y = 0. So yis a generalized eigenvector with eigenvalue A 
and exponent d - /, as claimed. Oo 


Proof of the Proposition. We note that 


Auj+uja ifj<d-1 
(4.7.3) Tuj =} du; ifj=d-1 
0 tj sd =1, 


Therefore Tu; is in the subspace X for all 7. This shows that X is invariant. Next, B 
generates X by definition. The lemma shows that every nontrivial linear combination of B is 
a generalized eigenvector, so it is not zero. Therefore B is an independent set. O 


Corollary 4.7.4 Let x be a generalized eigenvector for T, with eigenvalue A. Then A is an 
ordinary eigenvalue — a root of the characteristic polynomial of T. 


Proof. If the exponent of x is d, then with notation as above, ug_, is an eigenvector with 
eigenvalue A. Oo 
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Formula 4.7.3 determines the matrix that describes the action of T on the basis B of 
Proposition 4.7.1. It is the dx d Jordan block J). Jordan blocks are shown below for low 
values of d: 


‘ i 

(4.7.5) ty = [A], a ule 1A ee 2 ’ 
1. % 
es : 


Xr 


The operation of a Jordan block is especially simple when A = 0. The dxXd block Jo 
operates on the standard basis of C? as 


(4.7.6) eywegw + weg 0. 


The 1X1 Jordan block Jo is zero. 


The Jordan Decomposition Theorem below asserts that any complex n Xn matrix is 
similar to a matrix J made up of diagonal Jordan blocks (4.7.5) - that it has the Jordan form 


Jy 
(4.7.7) J= , 


where J; = Jj, for some Aj. The blocks J; can have various sizes d;, with )°d; = n, 
and the diagonal entries 4; aren’t necessarily distinct. The characteristic polynomial of the 
matrix J is 


(4.7.8) P(t) = (t—Aq)M(t~ Ag)B + (F- Ag)”. 
The 2X2 and 3 X3 Jordan forms are 


. re ei a 
(4.7.9) erie Joe je dae eal Gas. 
4 ’ 43 h2 1 Ay 


where the scalars A; may be equal or not, and in the fourth matrix, the blocks may be listed 
in the other order. 


Theorem 4.7.10 Jordan Decomposition. 

(a) Vector space form: Let T be a linear operator on a finite-dimensional complex vector 
space V. There is a basis B of V such that the matrix of T with respect to B has Jordan 
form (4.7.7). 

(b) Matrix form: Let A be ann Xn complex matrix. There is an invertible complex matrix P 
such that P-'AP has Jordan form. 

It is also true that the Jordan form of an operator T or a matrix A is unique except for the 

order of the blocks. 
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Proof. This proof is due to Filippov [Filippov]. Induction on the dimension of V allows us 
to assume that the theorem is true for the restriction of T to any proper invariant subspace. 
So if V is the direct sum of proper T-invariant subspaces, say V; ®--- ® V,, withr > 1, then 
the theorem is true for T. 


Suppose that we have generalized eigenvectors v;, fori = 1,...,r. Let V; be the 
subspace defined as in Proposition 4.7.1, with x = v;. If V is the direct sum V; ® --- ® V,, 
the theorem will be true for V, and we say that v,,..., vy are Jordan generators for T. We 


will show that a set of Jordan generators exists. 


Step 1: We choose an eigenvalue A of 7, and replace the operator T by T — AJ. If A is the 
matrix of T with respect to a basis, the matrix of T — AJ with respect to the same basis will 
be A — AI, and if one of the matrices A or A — AJ is in Jordan form, so is the other. So 
replacing T by T — AJ is permissible. Having done this, our operator, which we still call T, 
will have zero as an eigenvalue. This will simplify the notation. 


Step 2: We assume that 0 is an eigenvalue of 7. Let K; and U; denote the kernel and image, 
respectively, of the ith power 7’. Then K} CK2C--- and U, D> U2 D ---. Because Vis finite- 
dimensional, these chains of subspaces become constant for large r, say Km = Kmi1 =--- 
and Um = Um+; =+:-. Let K = Km and U = Uj, We verify that K and U are invariant 
subspaces, and that V is the direct sum K ® U. 

The subspaces are invariant because TK, C Km-1 C Km and TU», = Umi, = Um. 
To show that V = K © U, it suffices to show that K 1 U = {0} (see Proposition 4.3.1(b)). 
Let z be an element of KN U. Then 7” z = 0, and also z = T” v for some v in V. Therefore 
T?™y = 0,s0 v is an element of K2m. But Kom = Km, so Tv = 0,i.e.,z = 0. 

Since T has an eigenvalue 0, K is not the zero subspace. Therefore U has smaller 
dimension than V, and by our induction assumption, the theorem is true for T|y. Unfortu- 
nately, we can’t use this reasoning on K, because U might be zero. So we must still prove 
the existence of a Jordan form for T|x. We replace V by K and T by T|x. 


e A linear operator T on a vector space V is called nilpotent if for some positive integer r, 
the operator T” is zero. 


We have reduced the proof to the case of a nilpotent operator. 


Step 3: We assume that our operator T is nilpotent. Every nonzero vector will be a generalized 
eigenvector with eigenvalue 0. Let N and W denote the kernel and image of T, respectively. 
Since T is nilpotent, M+{0}. Therefore the dimension of W is smaller than that of V, 
and by induction, the theorem is true for the restriction of the operator to W. So there 
are Jordan generators w),..., wy for T|w. Let e; denote the exponent of w;, and let W; 
denote the subspace formed as in Proposition 4.7.1, using the generalized eigenvector wj. 
SoW=W,@.-.@ W,. 

For each i, we choose an element v; of V such that Jv; = w;. The exponent d; of v; 
will be equal to e; + 1. Let V; denote the subspace formed as in (4.7.1) using the vector v;. 
Then TV; = W;. Let U denote the sum V, +--- + V,. Since each V; is an invariant subspace, 
so is U. We now verify that v,,..., vy are Jordan generators for the restriction 7 |y, i.e., 
that the subspaces V; are independent. 
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We notice two things: First, 7/7 = W because TV; = Wj. Second, V;  N C W;. This 
follows from Lemma 4.7.2, which shows that V; 9 N is the span of the last basis vector 
T4-1y;, Since dj — 1 = e;, which is positive, T4~' v; is in the image Wj. 

We suppose given a relation #; +---+ 0, = 0, with 0; in V;. We must show that v; = 0 
for all 7. Let w; = Ti;. Then & +--- +, = 0, and w; is in W;. Since the subspaces W; are 
independent, W; = 0 for all i. So Tv; = 0, which means that 0; is in V; M N. Therefore 0; is 
in W,. Using the fact that the subspaces W; are independent once more, we conclude that, 
vy = 0 for all 7. 


Step 4: We show that a set of Jordan generators for T can be obtained by adding some 
elements of N to the set {v),..., ur} of Jordan generators for Ty. 

Let v be an arbitrary element of V and let Jv = w. Since TU = W,, there is a vector u 
in U such that Tu = w = Tv. Then z = v-—uisinN andv =u +z. ThereforeU +N = V. 
This being so, we extend a basis of U to a basis of V by adding elements, say z1,..., Zz, of 
N (see Proposition 3.4.16(a)). Let N’ be the span of (z1,...,Z¢). Then UN N’ = {0} and 
U+N'= V,so V is the direct sum U @ N’. 

The operator T is zero on N’, so N’ is an invariant subspace, and the matrix of 7] is 
the zero matrix, which has Jordan form. Its Jordan blocks are 1 X1 zero matrices, Therefore 
{Vj,..., Ur; 21,... Ze} is a set of Jordan generators for T. QO 


It isn’t difficult to determine the Jordan form for an operator 7, provided that the 
eigenvalues are known, and the analysis also proves uniqueness of the form. However, 
finding an appropriate basis of V can be painful, and is best avoided. 

To determine the Jordan form, one chooses an eigenvalue A, and replaces T by T — AT, 
to reduce to the case that A = 0. Let K; denote the kernel of 7‘, and let k; be the dimension 
of K;. In the case of a single dXd Jordan block with A = 0, these dimensions are: 


poe fi ikisd 
i = \d ifizd 


The dimensions k; for a general operator T are obtained by adding the numbers for 
each block with A = 0. So ky; will be the number of blocks with A = 0, kz — k; will be the 
number of blocks of size d > 2 with A = 0, and so on. 


Two simple examples: 


0 1 0 1-1 1 
A=/]1 01 and B= 2 2 21. 
0 -1 O -1 1 -1 


Here A* = 0, but A?+0. If v is a vector such that A2v+0, for instance v = e), then 
(v, Tv, Tv) will be a basis. The Jordan form consists of a single 3 X3 block. 

On the other hand, B? = 0. Taking v = e; again, the set (v, Tv) is independent, and 
this gives us a 2X2 block. To obtain the Jordan form, we have to add a vector in N, for 
example v’ = e2 + e3, which will give a 1X1 block (equal to zero). The required basis is 
(v, Tv, v’). 
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It is often useful to write the Jordan form as J = D + N, where D is the diagonal part 
of the matrix, and N is the part below the diagonal. For a single Jordan block, we will have 
D=dAl and N= Jo, as is illustrated below for a 3X3 block: 


x xr 0 
Jn = 71 A =/|0 A +/1 0 = AIl+Jop = D+N. 
1 0 1 


xr xr 0 


Writing J = D + N is convenient because D and N commute. The powers of J can be 
computed by the binomial expansion: 


(4.7.11) J =(D+N)S =D'+()D™IN + (DAN? + 


When J isann Xn matrix, N” = 0, and this expansion has at most n terms. In the case of a 
single block, the formula reads 


(4.7.12) Jo = (AL + Jo)" = AT + ()AT "So + (AAG + 


Corollary 4.7.13 Let T be a linear operator on a finite-dimensional complex vector space. 
The following conditions are equivalent: 


(a) T is a diagonalizable operator, 
(b) every generalized eigenvector is an eigenvector, 
(c) all of the blocks in the Jordan form for T are 1X1 blocks. 


The analogous statements are true for a square complex matrix A. 


Proof. (a) > (b): Suppose that T is diagonalizable, say that the matrix of T with respect to 
the basis B = (v1, ..., vy) is the diagonal matrix A with diagonal entries 1, ..., An. Let 
v be a generalized eigenvector in V, say that (T — A)*v = 0 forsome A and some k > 0. 
We replace T by T — A to reduce to the case that T*v = 0. Let X = (x1,..., Xn)! be the 
coordinate vector of v. The CODTEIASIES of T*v will be rE x. Since T*v = 0, either A; = 0, 
or x; = 0, and in either case, rE ;xi = 0. Therefore Tv = 0. 


(b) => (c): We prove the contrapositive. If the Jordan form of T has a k Xk Jordan block with 
k > 1, then looking back at the action (4.7.6) of J, — AJ, we see that there is a generalized 
eigenvector that is not an eigenvector. So if (c) is false, (b) is false too. Finally, it is clear that 
(c) = (a). Oo 


Here is a nice application of Jordan form. 


Theorem 4.7.14 Let 7 be a linear operator on a finite-dimensional complex vector space V. 
If some positive power of T is the identity, say 7” = J, then T is diagonalizable. 


Proof. It suffices to show that every generalized eigenvector is an eigenvector. To do this, 
we assume that (J — AJ)*v = 0 with v+0, and we show that (T — A)v = 0. Since A is an 
eigenvalue and since T’ = J, A” = 1. We divide the polynomial ” —1byr—A 


FS a a ES RD), 
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We substitute T for ¢ and apply the operators to v. Let w = (T — A)v. Since T” — 1=0, 


O= (TT —DNv=(TT!4 AT 2 4+... 447P°T 447!/(T -A)v 
= ( + RTT-2 polars Vr? T 4+ A) w 
=ri™lw. 
(For the last equality, one uses the fact that Tw = Aw.) Since ra”! w =0, w=0. | 
We go back for a moment to the results of this section. Where has the hypothesis that 


V be a vector space over the complex numbers been used? The answer is that its only use is 
to ensure that the characteristic polynomial has enough roots. 


Corollary 4.7.15 Let V be a finite-dimensional vector space over a field F’, and let T be a 
linear operator on V whose characteristic polynomial factors into linear factors in F’. The 
Jordan Decomposition theorem 4.7.10 is true for T. O 


The proof is identical to the one given for the case that F = C, 


Corollary 4.7.16 Let 7 be a linear operator on a finite-dimensional vector space over a field 
of characteristic zero. Assume that 7” = 7 for some r > 1 and that the polynomial ¢” — 1 
factors into linear factors in F’. Then T is diagonalizable. | 


The characteristic zero hypothesis is needed to carry through the last step of the proof 
of Theorem 4.7.14, where from the relation ra’~!w = 0 we want to conclude that w = 0. 
The theorem is false in characteristic different from zero. 


pV 9 


—Yvonne Verdier? 


EXERCISES 


Section 1 The Dimension Formula 
1.1. Let A be a £Xm matrix and let B be an nX p matrix. Prove that the rule M~ AMB 
defines a linear transformation from the space F”*” of m Xn matrices to the space F&*?. 
1.2. Let 11,...,U, be elements of a vector space V. Prove that the map y: F” > V defined 
by p(X) = yx; +++: + UnXy is a linear transformation. 
1.3. Let A be an m Xn matrix. Use the dimension formula to prove that the space of solutions 
of the linear system AX = 0 has dimension at least n — m. 


1.4. Prove that every m Xn matrix A of rank 1 has the form A = X Y', where X, Y are m- and 
n-dimensional column vectors. How uniquely determined are these vectors? 


2y-ve received many emails asking about this rebus. Yvonne, an anthropologist, and her husband Jean-Louis, a 
mathematician. were close friends who died tragically in 1989. In their memory, | included them among the people 
quoted. The history of the valentine was one of Yvonne’s many interests, and she sent this rebus as a valentine. 
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1.5. (a) Let U and W be vector spaces over a field F. Show that the operations two 
(u,w)+(u’, w’) = (ut+u’,w + w’) and c(u, w) = (cu, cw) on pairs of vectors 
make the product set U x W into a vector space. It is called the product space. 


(b) Let U and W be subspaces of a vector space V. Show that the map T: UX W > V 
defined by J(u, w) = w+ w isa linear transformation. 


(c) Express the dimension formula for T in terms of the dimensions of subspaces of V. 


Section 2 The Matrix of a Linear Transformation 
2.1. Let A and B be 22 matrices. Determine the matrix of the operator T: M ~» AMB on the 
space F?*2 of 2X2 matrices, with respect to the basis (€11, €12, €21, €22) of F?*?. 


2.2. Let A be ann Xn matrix, and let V denote the space of n-dimensiona! row vectors. What 
1s the matrix of the linear operator “right multiplication by A”’ with respect to the standard 
basis of V? 


2.3. Find all real 22 matrices that carry the line y = x to the line y = 3x. 
2.4. Prove Theorem 4.2 10(b) using row and column operations. 


2.5. 3Let A be an mxXn matrix of rank r, let 7 be a set of r row indices such that the 
corresponding rows of A are independent, and let J be a set of r column indices 
such that the corresponding columns of A are independent. Let M denote the rxr 
submatrix of A obtained by taking rows from / and columns from J. Prove that M is 
invertible 


Section 3 Linear Operators 
3.1. Determine the dimensions of the kernel and the image of the linear operator T on the 
space IR” defined by T(x), ..., Xn)' = (4 +n, X2+Xn-1, 2 Xn ty 


_j{a b 
3.2. (a) Lea =(2 if 


elementary matrices, one can eliminate the ‘“‘a”’ entry. 
(b) Which matrices with c = 0 are similar to a matrix in which the “a” entry is zero? 


be a real matrix, with c not zero. Show that using conjugation by 


3.3. Let T.V — V bea linear operator on a vector space of dimension 2. Assume that T is not 
multiplication by a scalar. Prove that there is a vector v in V such that (v, 7(v)) is a basis 
of V, and describe the matrix of 7 with respect to that basis. 


3.4. Let B be acomplex n Xn matrix. Prove or disprove: The linear operator T on the space of 
all w Xn matrices defined by 7(A) = AB — BA is singular. 


Section 4 Ejigenvectors 


4.1. Let T be a linear operator on a vector space V, and let A be a scalar. The eigenspace V) 
is the set of eigenvectors of T with eigenvalue A, together with 0. Prove that V) is a 
T-invariant subspace. 


4.2. (a) Let T be a linear operator on a finite-dimensional vector space V, such that 7? is the 
identity operator. Prove that for any vector v in V, v — Tv is either an eigenvector with 
eigenvalue -1, or the zero vector. With notation as in Exercise 4.1, prove that V is the 
direct sum of the eigenspaces V™ and VOU, 


3Suggested by Robert DeMarco 
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(b) Generalize this method to prove that a linear operator T such that 7+ = J decomposes 
a complex vector space into a sum of four eigenspaces. 


4.3. Let T be a linear operator on a vector space V. Prove that if W, and W2 are 7-invariant 
subspaces of V, then W; + W2 and W; NM W; are T-invariant. 


4.4, A 2X2 matrix A has an eigenvector v, = (1, 1)‘ with eigenvalue 2 and also an eigenvector 
v2 = (1, 2) with eigenvalue 3. Determine A. 


4.5. Find all invariant subspaces of the real linear operator whose matrix is 
1 
1 1 
(a) ; l (| 2 


4.6. Let P be the real vector space of polynomials p(x) = ap + a, +--+ nx" of degree at 
most n, and let D denote the derivative 4, considered as a linear operator on P. 


(a) Prove that D is a nilpotent operator, meaning that D* = 0 for sufficiently large k. 
(b) Find the matrix of D with respect to a convenient basis. 
(c) Determine all D-invariant subspaces of P. 
4.7. Let A = |? Al 
c da 


eigenvector for left multiplication by A is that AX = Y be a scalar multiple of X, which 
means that the slopes s = x2/x, and s’ = y2/y are equal. 


be a real 2X2 matrix. The condition that a column vector X be an 


(a) Find the equation in s that expresses this equality. 


(b) Suppose that the entries of A are positive real numbers. Prove that there is an 
eigenvector in the first quadrant and also one in the second quadrant. 


4.8. Let T be a linear operator on a finite-dimensional vector space for which every nonzero 
vector is an eigenvector. Prove that T is multiplication by a scalar. 


Section5 The Characteristic Polynomial 


5.1. Compute the characteristic polynomials and the complex eigenvalues and eigenvec- 
tors of 


—2 2 1 i cos@ -sin@ 
OE aa Om | © | oF AE 


§.2. The characteristic polynomial of the matrix below is t° — 4t — 1. Determine the missing 


entries. 
012 
11 0 
1 * 


§.3. What complex numbers might be eigenvalues of a linear operator T such that 
(a) T7=1, (b) T? -5T +61 =0? 
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5.4. 


5.5. 


5.6. 


5.7. 
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Find a recursive relation for the characteristic polynomial of the kx k matrix 


0 1 
1 0 1 
ie of 


and compute the polynomial for k < 5. 

Which real 22 matrices have real eigenvalues? Prove that the eigenvalues are real if the 
off-diagonal entries have the same sign. 

Let V be a vector space with basis (vp, ..., Un) and let ao, . .. , @, be scalars. Define a linear 
operator 7 on V by the rules 7(v;) = vj41 if i <n and T(v,,) = anvp + a1v) ++--+anUn. 
Determine the matrix of T with respect to the given basis, and the characteristic polynomial 
of T. 


Do A and At have the same eigenvectors? the same eigenvalues? 


5.8. Let A = (a;;) be a 3X3 matrix. Prove that the coefficient of ¢ in the characteristic 


5.9, 


5.10. 


polynomial is the sum of the symmetric 2X 2 minors 


det | 72) 12} 4 det | 711 713. | 4 det | 722 923 |. 
an, a2 a3, 33 a32 33 


Consider the linear operator of left multiplication by an m Xm matrix A on the space 
F™* of all m Xm matrices. Determine the trace and the determinant of this operator. 


Let A and B be n Xn matrices. Determine the trace and the determinant of the operator 
on the space F"*” defined by M ~> AMB. 


Section6 Triangular and Diagonal Forms 


6.1. 


6.2. 


6.3. 


6.4. 


6.5. 


Let A be an nXn matrix whose characteristic polynomial factors into linear factors: 
p(t) = (t—A1)---(t-— An). Prove that traceA =A, +--:+An, that detA = A, ---An. 


Suppose that a complex n Xn matrix A has distinct eigenvalues A1,..., An, and let 
U1, ...;, Un be eigenvectors with these eigenvalues. 


(a) Show that every eigenvector is a multiple of one of the vectors v;. 
(b) Show how one can recover the matrix from the eigenvalues and eigenvectors. 


Let T be a linear operator that has two linearly independent eigenvectors with the same 
eigenvalue A. Prove that A is a multiple root of the characteristic polynomial of T. 


2 1 
1 2 


matrix A22, 


Let A = | Find a matrix P such that P"!AP is diagonal, and find a formula for the 


In each case, find a complex matrix P such that P| AP is diagonal. 


. 001 : 
(a) [i at (b) o ole bees ges 
1 0 


; sn@  cos@ 
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6.6. Suppose that A is diagonalizable. Can the diagonalization be done with a matrix P in the 
special linear group? 

6.7. Prove that if A and B are n Xn matrices and A is nonsingular, then AB is similar to BA. 

6.8. A linear operator T is nilpotent if some positive power 7“ is zero: Prove that T is nilpotent 
if and only if there is a basis of V such that the matrix of T is upper triangular, with 
diagonal entries zero. 


6.9. Find all real 2X2 matrices such that A? = J, and describe geometrically the way they 
operate by left multiplication on R2. 


6.10. Let M be a matrix made up of two diagonal blocks: M = E D 


a <8 i. Prove that M is 
diagonalizable if and only if A and D are diagonalizable. 


b 


a 
6.11. Let A= E d 


be a 2X2 matrix with eigenvalue A. 
(a) Show that unless itis zero, the vector (b, A — a)' isan eigenvector. 


(b) Find a matrix P such that P~'AP is diagonal, assuming that b40 and that A has distinct 
eigenvalues. 


Section 7 Jordan Form 


110 
7.1. Determine the Jordan form of the matrix] 0 1 0 
01 1 


joes Ca | 
7.2, Prove that A = | -1 -1 -1 | is an idempotent matrix, i.e., that A2 = A, and find its 
1 1 1 


Jordan form. 

7.3. Let V be a complex vector space of dimension 5, and let T be a linear operator on V 
whose characteristic polynomial is (¢ — A)°. Suppose that the rank of the operator T — AI 
is2. What are the possible Jordan forms for T? 

7.4. (a) Determine all posible Jordan forms for a matrix whose characteristic polynomial is 

(t + 2)2(t ~ 5)°. 

(b) What are the poeiele Jordan forms for a matrix whose characteristic polynomial is 
(t+.2)?(t— 5)°, when space of eigenvectors with eigenvalue 2 is one-dimensional, and 
the space of eigenvectors with eigenvalue 5 is two-dimensional? 

7.5. What is the Jordan form of a matrix A all of whose eigenvectors are multiples of a single 
vector? 


7.6. Determine all invariant subspaces of a linear operator whose Jordan form consists of one 
block. 


7.7. Is every complex square matrix A such that A* = A diagonalizable? 
7.8. Is every complex square matrix A similar to its transpose? 


7.9, Find a 2X2 matrix with entries in F, that has a power equal to the identity and an 
eigenvalue in F », but is not diagonalizable. 
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Miscellaneous Problems 
M.1. Let v = (a1, ..., Gn) be a real row vector. We may form the n!Xn matrix M whose rows 


are obtained by permuting the entries of v in all possible ways. The rows can be listed in 
an arbitrary order. Thus if n = 3, M might be 


a4 a @& 
a a3 a2 
a2 a3 ay 
a a a3 
a3 a, a2 
a3 @ ay 


Determine the possible ranks that such a matrix could have. 


M.2. Let A be a complex n Xn matrix with n distinct eigenvalues Aj, ...,A,. Assume that A; 
is the largest eigenvalue, that is, that |A;| > |A;| for all i > 1. 


(a) Prove that for most vectors X, the sequence X;, = rk AEX converges to an 
eigenvector Y with eigenvalue 41, and describe precisely what the conditions on X 
are for this to be true. 

(b) Prove the same thing without assuming that the eigenvalues A1, ..., An are distinct. 


M.3. Compute the largest eigenvalue of the matrix E i to three-place accuracy, using a 
method based on Exercise M.2. 

M.4. If X = (x1, x2, ...) is an infinite real row vector and A = (a;;), 0 <i, j < oo is an infinite 
real matrix, one may or may not able to define the matrix product XA. For which A can 
one define right multiplication on the space R©® of all infinite row vectors (3.7.1)? on the 
space Z (3.7.2)? 


*M.5. Let g: F” > F” be left multiplication by an m Xn matrix A. 


(a) Prove that the following are equivalent: 
e Ahas aright inverse, a matrix B such that AB = J, 
* @is surjective, 
e the rank of A is m. 

(b) Prove that the following are equivalent: 
e Ahasa left inverse, a matrix B such that BA = /, 
* is injective, 
« the rank of A isn. 
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M.6. Without using the characteristi¢ polynomial, prove that a linear operatot of a vector space 
of dimension n can have at most 7 distinct eigenvalues. 


*M.7. (powers of an operator) Let T be a linear operator on a vector space V. Let K; atid W, 
denote the kernel and image, respectively, of T’. 


(a) Show that K; C Ko G--- and that W; > W22---. 
(b) The following conditions might or might not hold for a particular value of r: 
() Kr=Kru, (2)We= Wey, 3) W0Ki =(0}, (44)Wi4+K,=V. 
Find all implications among the conditions (1)—(4) when V is finite dimensional. 
(ce) Do the same thing when V is infittite dimensional. 


M.8. Let T be a linear operator on a finite-dimensional complex vector space V. 


(a) Let A be an eigenvalue of T, and let V;, be the set of generalized eigenvectors, together 
with the zero vector. Prove that V3 is a T-invariant subspace of V. (This subspace is 
called a generalized eigenspace.) 


(b) Prove that V is the direct sum of its generalized eigenspaces. 


M.9, Let V bé a finite-dimetisional vector space. A linear operator T: V > V is called a 
projection :if T* = T (not necessarily an “orthogofial projection”). Let K and W be the 
kernel and image of a linear opetatot T. Prove 
(a) T is a projection onto W if atid only ff the restriction of T to W is the identity map. 
(b) If T is a projection, then V is the direct sum W @ K. 

(c) The trace of a projection T is équal to its rank. 


M.10. Let A and B be m Xn and 1 Xm reéal matrices. 


(a) Prove that if 4 is a fonzerd eigenvalue of the mxm matrix AB then it is also ait 
eigenvalue of the nxn matrix BA. Show by example that this need not be true if 
r7=0. 

(b) Prove that /,, — AB is itivertible if and only if 7, — BA is invertible. 


CHAPTER 5 


Applications of Linear Operators 


By relieving the brain from all unnecessary work, 
a good notation sets it free to concentrate 
on more advanced problems. 


—Alfred North Whitehead 


5.1 ORTHOGONAL MATRICES AND ROTATIONS 


In this section, the field of scalars is the real number field. 
We assume familiarity with the dot product of vectors in R?. The dot product of column 
vectors X = (x1,.--,Xn)', ¥ = (1, .--, yn)' in R” is defined to be 


(5.1.1) (X-Y) = xpyy tess +XnYn. 


It is convenient to write the dot product as the matrix product of a row vector and a column 
vector: 


(5.1.2) (X-Y) =Xty. 
For vectors in R2, one has the formula 
(5.1.3) (X -Y) = |X||Y|cos9, 
where @ is the angle between the vectors. This formula follows from the law of cosines 
(5.1.4) 2 =a’ +b’ —2abcosd 


for the side lengths a, b, c of a triangle, where 6 is the angle between the sides a and b. 
To derive (5.1.3), we apply the law of cosines to the triangle with vertices 0, X, Y. Its side 
lengths are |X|, |Y], and |X — Y], so the law of cosines can be written as 


(X —Y)-(X -Y)) = (X%-X) 4+ (Y-Y) -2XIY| cos 0. 


The left side expands to (X -X) —2(X- Y) + (Y-Y), and formula (5.1.3) is obtained by 
comparing this with the right side. The formula is valid for vectors in R” too, but it requires 


132 
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understanding the meaning of the angle, and we won’t take the time to go into that just now 


(see (8.5.2)). 
The most important points for vectors in R* and R? are 


e the square |X? of the length of a vector X is (X - X) = X‘X, and 
e a vector X is orthogonal to another vector Y, written X 1 Y, if and only if 
x'y =0. 


We take these as the definitions of the length |X| of a vector and of orthogonality of 
vectors in R”. Note that the length |X| is positive unless X is the zero vector, because 
|X|? = XX = et +:-- +x2 is a sum of squares. 


Theorem 5.1.5 Pythagoras. If X . Y and Z =X + Y, then (Z|? = |X|? + |Y/?. 
This is proved by expanding Z'Z. If X L Y, then X'Y = Y'X =0,so 
ZZ = (X+Y)'(X+¥Y) = MX MY+YX+ YY = X'x+v'y. 0 


We switch to our lowercase vector notation. If v;, ..., vg are orthogonal vectors in R” 
and if w = vy +----+ vx, then Pythagoras’s theorem shows by induction that 


(5.1.6) wi? = [uf +--+ + Jug). 
Lemma 5.1.7 Any set (v1, ..., Ug) of orthogonal nonzero vectors in R” is independent. 


Proof. Let w = cyv, +---+ cv, be a linear combination, where not all c; are zero, and let 
w; = cjv;. Then w is the sum w] + --- + w, of orthogonal vectors, not all of which are zero. 
By Pythagoras, |w|? = |wi/?7+---+ |wel* > 0,so w<0. O 


e Anorthonormal basis B = (v1, ..., Un) of R” is a basis of orthogonal unit vectors (vectors 
of length one). Another way to say this is that B is an orthonormal basis if 


(5.1.8) (v; - vj) = dij, 


where 6;;, the Kronecker delta, is the i, j-entry of the identity matrix, which is equal to 1 if 
i= jandtoOifi+/. 

Definition 5.1.9 A real n Xn matrix A is orthogonal if A'A = I, which is to say, A is invertible 
and its inverse is At. 


Lemma 5.1.10 Ann Xn matrix A is orthogonalif and onlyifits columns form an orthonormal 
basis of R”. 


Proof. Let A; denote the ith column of A. Then A} is the ith row of A‘. The i, j-entry of A'A 
is ALA;, so A'A =/ if and only if A}A; = 4); for all i and j. oO 


The next properties of orthogonal matrices are easy to verify: 
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Proposition 5.1.11 


(a) The product of orthogonal matrices is orthogonal, and the inverse of an orthogonal 
matrix, its transpose, is orthogonal. The orthogonal matrices form a subgroup O, of 
GLn, the orthogonal group. 

(b) The determinant of an orthogonal matrix is + 1. The orthogonal matrices with determi- 
nant 1 form a subgroup SO,, of O, of index 2, the special orthogonal group. 0 


Definition 5.1.12 An orthogonal operator T on R” is a linear operator that preserves the dot 
product: For every pair X, Y of vectors, 


(TX. TY) =(X-Y). 


Proposition 5.1.13 A linear operator T on R” is orthogonal if and only if it preserves lengths 
of vectors, or, if and only if for every vector X, (TX -TX) = (X-X). 


Proof. Suppose that lengths are preserved, and let X and Y be arbitrary vectors in R”. 
Then 
(TX + Y)-TOX+Y)) =(X+Y)-(X+¥)). 


The fact that (TX - TY) = (X - Y) follows by expanding the two sides of this equality and 
cancelling. O 


Proposition 5.1.14 A linear operator T on R” is orthogonal if and only if its matrix A with 
respect to the standard basis is an orthogonal matrix. 


Proof. If A is the matrix of 7, then 
(TX . TY) = (AX)'(AY) = X'(AtA)Y. 


The operator is orthogonal if and only if the right side is equal to X'Y for all X and Y. We 
can write this condition as X'(A'A — J) Y = 0. The next lemma shows that this is true if and 
only if A‘A — J = 0, and therefore A is orthogonal. O 


Lemma 5.1.15 Let M be ann Xn matrix. If X'MY = 0 for all column vectors X and Y, then 
M=0. 


Proof. The product e} Me; evaluates to the i, j-entry of M. For instance, 


my m2}/1)_ 
[e i ma || |= ma 
If e}Me; = 0 for all i and j, then M =0. Oo 


We now describe the orthogonal 2 x2 matrices. 


* A linear operator T on R? is a reflection if it has orthogonal eigenvectors v, and v2 with 
eigenvalues 1 and -1, respectively. 
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Because it fixes vj and changes the sign of the orthogonal vector v2, such an operator 
reflects the plane about the one-dimensional subspace spanned by v1. Reflection about the 
e1-axis is given by the matrix 


1 0 
5.1.16 = 
( ) So E 4 
Theorem 5.1.17 
(a) The orthogonal 2X2 matrices with determinant 1 are the matrices 
c -s 
(5.1.18) r={¢ Je 


with c = cos@ and s = sin 0, for some angle 0. The matrix R represents counterclockwise 
rotation of the plane R2 about the origin and through the angle 0. 


(b) The orthogonal 2X2 matrices A with determinant -1 are the matrices 
(5.1.19). S= s | = RSo 


5S -Cc 


with c and s as above. The matrix S reflects the plane about the one-dimensional 
subspace of R2 that makes an angle id with the e,-axis. 


a-[ 2] 


is orthogonal. Then its columns are unit vectors (5.1.10), so the point (c, s)' lies on the unit 
circle, and c = cos@ and s = sin, for some angle 6. We inspect the product P = R'A, where 
R is the matrix (5.1.18): 


Proof. Say that 


_ pta _ 1 * 
(5.1.20) p=RA=[) al 
Since R' and A are orthogonal, so is P. Lemma 5.1.10 tells us that the second column is a unit 
vector orthogonal to the first one. So 


(5.1.21) pe E ai 


Working back, A = RP,soA = RifdetA =landA=S= RSoif detA =-1. 

We've seen that R represents a rotation (4.2.2), but we must still identify the operator 
defined by the matrix S. The characteristic polynomial of S is t? — 1, so its eigenvalues are 
1 and -1. Let X; and X2 be unit-length eigenvectors with these eigenvalues. Because S is 


orthogonal, 
(X1 -X2) = (SX, -SX2) = (X1 --X2) = -(X%1 - X2). 
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It follows that (X, -X2) = 0. The eigenvectors are orthogonal. The span of X, will be the 
line of reflection. To determine this line, we write a unit vector X as (c’, s’)', with c’ = cosa@ 


and s’ = sina. Then 
SX = cc’ +ss’|__ | cos(@-a@) 
~ | sce’ —cs' | | sin(Q@—a@) |" 


Whena = 50, X is an eigenvector with eigenvalue 1, a fixed vector. oO 
We describe the 3 X3 rotation matrices next. 


Definition 5.1.22 A rotation of R> about the origin is a linear operator p with these 
properties: 

e p fixes a unit vector u, called a pole of p, and 

e protates the two-dimensional subspace W orthogonal to u. 


The axis of rotation is the line 2 spanned by u. We also call the identity operator a rotation, 
though its axis is indeterminate. 
If multiplication by a 3X3 matrix R is a rotation of R?, R is called a rotation matrix. 


9 


(5.1.23) A Rotation of R?. 


The sign of the angle of rotation depends on how the subspace W is oriented. We’ll orient 
W looking at it from the head of the arrow u. The angle 0 shown in the figure is positive. 
(This is the ‘‘right hand rule.”’) 

When u is the vector €1, the set (€2, €3) will be a basis for W, and the matrix of ¢ will 
have the form 


1 0 0O 
(5.1.24) M=|0 c -s |], 
0s ic 


where the bottom right 22 minor is the rotation matrix (5.1.18). 


* Arotation that is not the identity is described by the pair (u, 8), called a spin, that consists 
of a pole u and a nonzero angle of rotation 0. 


The rotation with spin (u, 6) may be denoted by py ,¢). Every rotation p different 
from the identity has two poles, the intersections of the axis of rotation £ with the unit sphere 
in R?. These are the unit-length eigenvectors of o with eigenvalue 1. The choice of a pole 
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u defines a direction on @, and a change of direction causes a change of sign in the angle 
of rotation. If (u, @) is a spin of pe, so is (-u,-9@). Thus every rotation has two spins, and 


P(u,0) = PC-u,-6)- 


Theorem 5.1.25 Euler’s Theorem. The 3X3 rotation matrices are the orthogonal 3x3 
matrices with determinant 1, the elements of the special orthogonal group SO3. 


Euler’s Theorem has a remarkable consequence, which follows from the fact that SO3isa 
group. It is not obvious, either algebraically or geometrically. 


Corollary 5.1.26 The composition of rotations about any two axes is a rotation about some 
other axis. oO 


Because their elements represent rotations, the groups SO and SO3 are called the 
two- and three-dimensional rotation groups. Things become more complicated in dimension 
greater than 3. The 4X4 matrix 


cos@ -sing 
5.1.27 sin@ cosa 
( ) cosB -sinB 


sinB cos 


is an element of SO4. Left multiplication by this matrix rotates the two-dimensional subspace 
spanned by (e1, €2) through the angle a, and it rotates the subspace spanned by (e3, e4) 
through the angle £. 

Before beginning the proof of Euler’s Theorem, we note two more consequences: 


Corollary 5.1.28 Let M be the matrix in SO3 that represents the rotation O(y,.) with 
spin (u, a). 


(a) The trace of M is 1+2cosa. 
(b) Let B be another element of SO3,and let u’ = Bu. Theconjugate M’ = BMB' represents 
the rotation Py’) with spin (u’, a). 


Proof. (a) We choose an orthonormal basis (v1, v2, v3) of R? such that v; = u. The matrix 
of p with respect to this new basis will have the form (5.1.24), and its trace will be 1+ 2 cosa. 
Since the trace doesn’t depend on the basis, the trace of M is 1 + 2cosa@ too. 


(b) Since SO3 is a group, M’ is an element of SO3. Euler’s Theorem tells us that M’ is a 
rotation matrix. Moreover, u’ is a pole of this rotation: Since B is orthogonal, uv’ = Bu has 
length 1, and 

M'u' = BMB"'u' = BMu = Bu=w'. 


Let a’ be the angle of rotation of M’ about the pole uw’. The traces of M and its conjugate 
M’ are equal, so cosa = cosa’. This implies that a’ = +a. Euler’s Theorem tells us that 
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the matrix B also represents a rotation, say with angle 8 about some pole. Since B and M’ 
depend continuously on £, only one of the two values +a for @’ can occur. When 6 = 0, 
B=1,M' = M,ando’ =a. Therefore a = @ forall B. Oo 


Lemma 5.1.29. A 3X3 orthogonal matrix M with determinant 1 has an eigenvalue 
equal to 1. 


Proof. To show that 1 is an eigenvalue, we show that the determinant of the matrix M — I 
is zero. If B is ann Xn matrix, det (-B) = (-1)"det B. We are dealing with 3 <3 matrices, so 
det (M — I) = -det (J — M). Also, det (M — J)‘ = det (M — I) and det M = 1. Then 


det (M — I) = det (M — 1)! = det Mdet (M — 1)‘ = det (M(M‘ — 1)) = det (I — M). 
The relation det (M — J) = det (J — M) shows that det (M — I) = 0. O 


Proof of Euler’s Theorem. Suppose that M represents a rotation p with spin (u,a). We 
form an orthonormal basis B of V by appending to u an orthonormal basis of its orthogonal 
space W. The matrix M’ of ¢ with respect to this basis will have the form (5.1.24), which 
is orthogonal and has determinant 1. Moreover, M = PM’P™!, where the matrix P is equal 
to [B] (3.5.13). Since its columns are orthonormal, [B] is orthogonal. Therefore M is also 
orthogonal, and its determinant is equal to 1. 


Conversely, let W@ be an orthogonal matrix with determinant 1, and let T denote left 
multiplication by M. Let u be a unit-length eigenvector with eigenvalue 1, and let W be the 
two-dimensional space orthogonal to u. Since T is an orthogonal operator that fixes u, it 
sends W to itself. So W is a 7-invariant subspace, and we can restrict the operator to W. 

Since T is orthogonal, it preserves lengths (5.1.13), so its restriction to W is orthogonal 
too. Now W has dimension 2, and we know the orthogonal operators in dimension 2: they are 
the rotations and the reflections (5.1.17). The reflections are operators with determinant -1. 
Ifan operator T acts on W as a reflection and fixes the orthogonal vector u, its determinant 
will be -1 too. Since this is not the case, 7 is a rotation. This verifies the second condition 
of Definition 5.1.22, and shows that T is a rotation. O 


5.2 USING CONTINUITY 


Various facts about complex matrices can be deduced by diagonalization, using reasoning 
based on continuity that we explain here. 


A sequence A, of n Xn matrices converges to ann Xn matrix A if for every i and j, the 
i, j-entry of Ag converges to the i, j entry of A. Similarly, a sequence p;(t),k = 1,2,..., of 
polynomials of degree n with complex coefficients converges to a polynomial p(7) of degree 
nif for every j, the coefficient of t/ in pz converges to the corresponding coefficient of p. We 
may indicate that a sequence 5S; of complex numbers, matrices, or polynomials converges to 
S by writing S; > S. 


Proposition 5.2.1 Continuity of Roots. Let p,(t) be a sequence of monic polynomials of 
degree < n, and let p(t) be another monic polynomial of degree n. Let ay 1,..., x, and 
Q1,...@,, denote the roots of these polynomials. 
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(a) Ifa, > ay forv=1,...,n,then py > p. 
(b) Conversely, if p, — p, the roots a; of px can be numbered in such a way that 
Oy > oy foreachv=1,...,n. 


In part (b), the roots of each polynomial pz, must be renumbered individually. 


Proof. We note that px(t) = (t — a@g,1)--- (f — xn) and p(t) = (t— a) ---(¢ — @p). Part 
(a) follows from the fact that the coefficients of p(t) are continuous functions ~ polynomial 
functions — of the roots, but (b) is less obvious. 


Step I: Let ocz,y be a root of px nearest to ay, i.e., such that joy , — a1| is minimal. We 
renumber the roots of px so that this root becomes a, 1. Then 


Joey — Oey |” < [Cy — KR 1) ++ (1 — Ak n)| = |PR(O1)i. 


The right side converges to | p(a;)| = 0. Therefore the left side does too, and this shows that 
Aki > 4. 


Step 2: We divide, writing px(t) = (t — ag,1)qx(0) and p(t) = (t — a)q(t). Then g, and 
q are monic polynomials, and their roots are jy 2,...,@k,n and a2, ..., Qn, respectively. 
If we show that g, — q, then by induction on the degree n, we will be able to arrange the 
roots of g, so that they converge to the roots of g, and we will be done. 

To show that qx — q, we carry the division out explicitly. To simplify notation, 
we drop the subscript 1 from a . Say that p(t) = ¢? + Gynt”! + ---+4+ay,t + ao, that 
q(t) = "1 + by_2t-? +. --- + byt + bo, and that the notation for pz and g, is analogous. 
The equation p(t) = (t — a)q(t) implies that 


bn-2 = &+An-1, 


bn-3 = a? + + An-2, 


-2 


by = ot 4+ 0a, 1 4+--- +002 +44. 


Since a, 1 > a and ay ; > aj, it is true that by; > Bj. 0 


Proposition 5.2.2. Let A be ann Xn complex matrix. 


(a) There is a sequence of matrices A, that converges to A, and such that for all k the 
characteristic polynomial p,(t) of Ax has distinct roots. 

(b) If a sequence A, of matrices converges to A, the sequence p,(t) of its characteristic 
polynomials converges to the characteristic polynomial p(d) of A. 

(c) Let A; be the roots of the characteristic polynomial p. If Ay — A, the roots Ax; of px 
can be numbered so that A, ; — A; for each i. 


Proof. (a) Proposition 4.6.1 tells us that there is an invertible n Xn matrix P such that 
A’ = P!AP is upper triangular. Its eigenvalues will be the diagonal entries of that matrix. 
We let Aj, be a sequence of matrices that converges to A’, whose off-diagonal entries are the 
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same as those of A’, and whose diagonal entries are distinct. Then A’, is upper triangular, and 
its characteristic polynomial has distinct roots. Let Ag = PAP, Since matrix multiplication 
is continuous, A, — A. The characteristic polynomial of Ax is the same as that of A’, so it 
has distinct roots. 


Part (b) follows from (a) because the coefficients of the characteristic polynomial depend 
continuously on the matrix entries, and then (c) follows from Proposition 5.2.1. QD 


One can use continuity to prove the famous Cayley-Hamilton Theorem. We state the 
theorem in its matrix form. 


Theorem 5.2.3 Cayley-Hamilton Theorem. Let p(t) = 2" + cy_1t”~! +---+01t+cp be the 
characteristic polynomial of an n Xn complex matrix A. Then p(A) = A” 4+ c,-yA77} 4 
+++ +0 A + Col is the zero matrix. 


For example, the characteristic polynomial of the 2X2 matrix A, with entries a, b, c, d 
as usual, is 12 — (a+ d)t + (ad — bc) (4.5.12). The theorem asserts that 


2 
a b a b 10] _ fo 0 
(5.2.4) [¢ 4 -(a+a|é i] + (ad ~ bo)| | = i He 
This is easy to verify. 


Proofof the Cayley-Hamilton Theorem. Step 1: The case that A is a diagonal matrix. 
Let the diagonal entries be 41, ... , An. The characteristic polynomial is 


P(t) = (t-A1) ++ (f— An). 


Here p(A) is also a diagonal matrix, and its diagonal entries are p(A;). Since A; are the 
roots of p, p(A;) = 0 and p(A) = 0. 


Step 2: The case that the eigenvalues of A are distinct. 
In this case, A is diagonalizable; say A’ = P"!AP is diagonal. Then the characteristic 
polynomial of A’ is the same as the characteristic polynomial p(t) of A, and moreover, 


P(A) = Pp(A')P 
(see (4.6.14)), By step 1, p(A’) = 0,s0 p(A) = 0. 


Step 3: The general case. 

We apply proposition 5.2.2. We let A, be a sequence of matrices with distinct 
eigenvalues that converges to A. Let p, be the characteristic polynomial of Ax. Since the 
sequence px, converges to the characteristic polynomial p of A, px(Ag) — pA). Step 2 
tells us that p,(A;,) = 0 for all k. Therefore p(A) = 0. O 
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5.3 SYSTEMS OF DIFFERENTIAL EQUATIONS 


We learn in calculus that the solutions of the differential equation 
(5.3.1) — =ax 


are x(t) = ce”, where c is an arbitrary real number. We review the proof because we want 
to use the argument again. First, ce®’ does solve the equation. To show that every solution 
has this form, let x(t) be an arbitrary solution. We differentiate e~% x(t) using the product 
rule: 


(5.3.2) £(e#x(0) = (-ae™) x(t) +  (ax(1)) = 0. 


Thus e~?'x(t) is a constant c, and x(t) = ce”. 

To extend this solution to systems of constant coefficient differential equations, we use 
the following terminology. A vector-valued function or matrix-valued function is a vector or 
a matrix whose entries are functions of f: 


x1() a(t) +++ atn(t) 
(5.3.3) X=] : |, AM= 
Xn(t) Ami(t) --- Qmn(t) 


The calculus operations of taking limits and differentiating are extended to vector- 
valued and matrix-valued functions by performing the operations on each entry separately. 
The derivative of a vector-valued or matrix-valued function is the function obtained by 
differentiating each entry: 


x4 (t) a0 -: a,@ 
dA 
(5.3.4) or . = : : 
x), (ft) ay) +++ Ann ® 


where x;'(f) is the derivative of x;(t), and so on. So ax is defined if and only if each of the 
functions x;(t) is differentiable. The derivative can also be described in matrix notation: 


dX _ Xt -XO 


(5.3.5) Pate h 


Here X(t + h) — X(t) is computed by vector addition and the h in the denominator stands 
for scalar multiplication by h~!. The limit is obtained by evaluating the limit of each entry 
separately. So the entries of (5.3.5) are the derivatives x; (t). The analogous statement is true 
for matrix-valued functions. 
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Many elementary properties of differentiation carry over to matrix-valued functions. 
The product rule, whose proof is an exercise, is an example: 


Lemma 5.3.6 Product Rule. 


(a) Let A(t) and B(f) be differentiable matrix-valued functions of 1, of suitable sizes so 
that their product is defined. Then the matrix product A(t)B(t) is differentiable, and its 
derivative is 


d(AB) _ dA Had dB 
dt att dt’ 
(b) Let Aj,..., Ay be differentiable matrix-valued functions of t, of suitable sizes so that 


their product is defined. Then the matrix product A;--.A, is differentiable, and its 
derivative is 


k 
d dA; 
Bares Ava, JAjat Ag. e 
i= 


dt 


A system of homogencous linear, first-order, constant-coefficient differential equations 
is a matrix equation of the form 


(5.3.7) —— = AX, 


where A is a constant m Xn matrix and X(f) is an n-dimensional vector-valued function. 
Writing out such a system, we obtain a system of differential equations 


dx 

oa) to + in tnt) 
(5.3.8) 

aXy 

Ap GE tors + Ann Xn (ft). 


The x;(¢) are unknown functions, and the scalars a;; are given. For example, if 


oe ee” 

(5.3.9) A= F ‘| ; 
(5.3.7) becomes a system of two equations in two unknowns: 

d 

i = 3x, +2x2 
(5.3.10) 

de = x) +4x 

oo 1 2+ 


The simplest systems are those in which A is a diagonal matrix with diagonal entries 
A;. Then equation (5.3.8) reads 
dx 


(5.3.11) a =Ajxi(), i=l,...,n. 
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Here the unknown functions x; are not mixed up by the equations, so we can solve for each 
one separately: 


(5.3.12) x= ce", 


for some arbitrary constants c;. 
The observation that allows us to solve the differential equation (5.3.7) in many cases 
is this: If V is an eigenvector for A with eigenvalue A, i.e., if AV = AV, then 


(5.3.13) X=evV 


is a particular solution of (5.3.7). Here e*4V must be interpreted as the product of the 
variable scalar e*! and the constant vector V. Differentiation operates on the scalar function, 
fixing V, while multiplication by A operates on the vector V, fixing the scalar e*!, Thus 
deMV = deV'V and also Ae*'V = Ae“'V. For example, 


ot 


are eigenvectors of the matrix (5.3.9), with eigenvalue 5 and 2, respectively, and 
et ert 
(5.3.14) | Se and [2 


solve the system (5.3.10). 


This observation allows us to solve (5.3.7) whenever the matrix A has distinct real 
eigenvalues. In that case every solution will be a linear combination of the special solutions 
(5.3.13). To work this out, it is convenient to diagonalize. 


Proposition 5.3.15 Let A be an Xn matrix, and let P be an invertible matrix such that 
A = P"!AP is diagonal, with diagonal entries 44, ..., An. The general solution of the system 


ax = AX is X = PX, where X = (c,e*"',..., cne*")' solves the equation a = AX. 


The coefficients c; are arbitrary. They are often determined by assigning initial condi- 
tions — the value of X at some particular fo. 


Proof. We multiply the equation a = = AX by P: p& — = PAX = APX. But since P is 


constant, p& = aX) — = ax . Thus 2 = AX. This reasoning can be reversed, so X solves 
the equation with A if ae ole if X i the equation with A. 0 


The matrix that diagonalizes the matrix (5.3.10) was computed before (4.6.8): 


(5.3.16) a=[} a Pel ee and a=|° se 
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Thus 
_[x] pe _f1 2] fere*] _ [cre* + 2en€7" 
san x-[]=nr-[! 2][ost]-[ogeeaet] 


In other words, every solution is a linear combination of the two basic solutions (5.3.14). 


We now consider the case that the coefficient matrix A has distinct eigenvalues, but 
that they are not all real. To copy the method used above, we first consider differential 
equations of the form (5.3.1), in which a is a complex number. Properly interpreted, the 
solutions of such a differential equation still have the form ce”. The only thing to remember 
is that e* will now be a complex-valued function of the real variable ¢. 

The definition of the derivative of a complex-valued function is the same as for real- 
valued functions, provided that the limit (5.3.5) exists. There are no new features. We can 
write any such function x(f) in terms of its real and imaginary parts, which will be real-valued 
functions, say 


(5.3.18) x(t) = p(t) + iq(d). 


Then ~ is differentiable if and only if p and q are differentiable, and if they are, the derivative 
of x is p’ + iq’. This follows directly from the definition. The usual rules for differentiation, 
such as the product rule, hold for complex-valued functions. These rules can be proved 
either by applying the corresponding theorem for real functions to p and q, or by copying 
the proof for real functions. 

The exponential of a complex number a = r + si is defined to be 


(5.3.19) e? = et — e"(coss +isins). 


Differentiation of this formula shows that de” /dt = ae*’. Therefore ce* solves the 
differential equation (5.3.1), and the proof given at the beginning of the section shows that 
these are the only solutions. 

Having extended the case of one equation to complex coefficients, we can use diago- 
nalization to solve a system of equations (5.3.7) when A is a complex matrix with distinct 
eigenvalues. 

11 i 
-1 1 1 
with eigenvalues 1 + 7 and 1 — i, respectively. Let B denote the basis (11, v2). Then A is 
diagonalized by the matrix P = [B]: 


(5.3.20) Plapas| J ale lle Here je 


For example, let A = | The vectors v; = : and v2 = are eigenvectors, 
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3 d (1+)t 
Then X = fe] = ee I The solutions of (5.3.7) are 


oed-it 
7 (1+i)t . ;,,,(01-i)t 
X14] _ _ | ce ~ #tC2€ ; 
(5.3.21) Bi =PX= Bee ir eo | , 


where cj, c2 are arbitrary complex numbers. So every solution is a linear combination of the 
two basic solutions 


elltit ieG-Hr 
(5.3.22) | Sc | and | "gaze |: 


However, these solutions aren’t very satisfactory, because we began with a system of 
differential equations with real coefficients, and the answer we obtained is complex. When 
the equation is real, we will want the real solutions. We note the following lemma: 


Lemma 5.3.23 Let A be a real n Xn matrix, and let X(t) be a complex-valued solution of 
the differential equation oa = AX. The real and imaginary parts of X(t) solve the same 


equation. O 


Now every solution of the original equation (5.3.7), whether real or complex, has the 
form (5.3.21) for some complex numbers c;. So the real solutions are among those we have 
found. To write them down explicitly, we may take the real and imaginary parts of the 
complex solutions. 

The real and imaginary parts of the basic solutions (5.3.22) are determined using 
(5.3.19). They are 


zt fig 
e cost e sint 
3. ; and : 
225) ba fee 
Every real solution is a real linear combination of these particular solutions. 


5.4 THE MATRIX EXPONENTIAL 


Systems of first-order linear, constant-coefficient differential equations can be solved for- 
mally, using the matrix exponential. 

The exponential of an n Xn real or complex matrix A is the matrix obtained by 
substituting A for x and J for 1 into the Taylor’s series for e*, which is 
Ke ee 


tote, 


x 
(5.4.1) e=l+ 7+ art Fy 
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Thus by definition, 


(5.4.2) eh ap ee a 


We will be interested mainly in the matrix valued function e’4 of the variable scaiar ¢, 
so we substitute ¢A for A: 


ta PAX #3 
tA __ 
(5.4.3) eo? =IT+ ii + TT rT 


Theorem 5.4.4 


(a) The series (5.4.2) converges absolutely and uniformly on bounded sets of complex 
matrices. 

(b) e'4 is a differentiable function of ft, and its derivative is the matrix product Ae’“. 

(c) Let A and B be complex n Xn matrices that commute: AB = BA. Then e4t = e4e?. 


In order not to break up the discussion, we have moved the proof of this theorem to the end 
of the section. 

The hypothesis that A and B commute is essential for carrying the fundamental 
property e*t” = e*eY over to matrices. Nevertheless, (c) is very useful. 


Corollary 5.4.5 For any n Xn complex matrix A, the exponential e“ is invertible, and its 


inverse is e74. 


Proof. Because A and -A commute, e4e4 = e4-4 = e = 1. oO 


Since matrix multiplication is relatively complicated, it is often not easy to write down 
the entries of the matrix e4. They won’t be obtained by exponentiating the entries of A unless 
A is a diagonal matrix. If A is diagonal, with diagonal entries 41, ..., An, then inspection of 
the series shows that e4 is also diagonal, and that its diagonal entries are e”. 

The exponential is also fairly easy to compute for a triangular 2X2 matrix. For 


example, if 
11 
sale 
then 


(5.4.6) Ae ileal altal i]t = i a: 


It is a good exercise to calculate the missing entry * directly from the series. 
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The exponential of e4 can be determined whenever we know a matrix P such that 
A = P"'AP is diagonal. Using the rule P-'A*P = (P~' AP) (4.6.12) and the distributive law 
for matrix multiplication, 


al et 2 
TAR) APTA 2 3 eae oe: 


(5.4.7) Ple4p= (PUP) + a Si 


Suppose that A is diagonal, with diagonal entries A;. Then e4 is also diagonal, and its 
diagonal entries are e*’. In thiscase we can compute e explicitly: 


(5.4.8) e = PeAP!}.” 


For example, if A = k | and P = ki 1} then P AP=A= he »}:° 


darrell olf a]-[F 2g] 


The next theorem relates the matrix exponential to differential equations: 


Theorem 5.4.9 Let A be a real or complex n Xn matrix. The columns of the matrix e’4 form 
a basis for the space of solutions of the differential equation ae = AX. 


Proof. Theorem 5.4.4(b) shows that the columns of e’A solve the differential equation. To 
show that every solution is a linear combination of the columns, we copy the proof given at 
the beginning of Section 5.3. Let X(t) be an arbitrary solution. We differentiate the matrix 
product e'4X (ft) using the product rule (5.3.6): 


(5.4.10) . (e"x@) = (-ae) X() +64 (AX(D). 


Fortunately, A and e~'4 commute. This follows directly from the definition of the expo- 
nential. So the derivative is zero. Therefore e~'4X(t) is a constant column vector, say 


C = (c},...,€n)', and X(t) = e'4C. This expresses X(t) as a linear combination of the 
columns of e'4, with coefficients c;. The expression is unique because e!4 is an invertible 
matrix. F oO 


Though the matrix exponential always solves the differential equation (5.3.7), it may 
not be easy to apply in a concrete situation because computation of the exponential can be 
difficult. But if A is diagonalizable, the exponential can be computed as in (5.4.8). We can 
use this method of evaluating e’4 to solve equation (5.3.7). Of course we will get the same 
solutions as we did before. Thus if A, P, and A are as in (5.3.16), then 


Pye 7 ere ee (ea —1\f-1 -2] __ 1[ (et +2e7f) (2e°t — 2e"4) 
ew = Pe’P =f; el et 3 -1 1 me (e*! — e) (2e + e) i 
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The columns of the matrix on the right form a second basis for the space of solutions 
that was obtained in (5.3.17). 


One can also use Jordan form to solve the differential equation. The solutions for 
an arbitrary kxXk Jordan block J, (4.7.5) can be determined by computing the matrix 
exponential. We write J, = AI + N, as in (4.7.12), where N is the kXk Jordan block Jo with 
A = 0. Then N¥ = 0, s0 

tN te 1 Nel 
it Get 


Since N and AJ commute, 


k-Anyk-1 
tJ A WIN At tN rN 
é => — IT —- eee Raine shied m 
ee e ( + 7 Ae (k-1)! 


Thus if J is the 3X3 block 


then 


et 1 et 
eV = et es =| te’ 3! : 
et xe? tl ite test @3t 


The columns of this matrix form a basis for the space of solutions of the differential 
equation ax =JX. 


We now go back to prove Theorem 5.4.4. The main facts about limits of series that we 
will use are given below, together with references to [Mattuck] and [Rudin]. Those authors 
consider only real valued functions, but the proofs carry over to complex valued functions 
because limits and derivatives of complex valued functions can be defined by working on the 
real and imaginary parts separately. 

If r and s are real numbers with r < s, the notation [7, s] stands for the interval 
r<t<s. 


Theorem 5.4.11 ({[Mattuck], Theorem 22.2B, [Rudin], Theorem 7.9). Let mm, be a series of 
positive real numbers such that > m, converges. If u(t) are functions on an interval [r, s], 
and if jw“) (t)| < my, for all k and all ¢ in the interval, then the series ¥ u® (2) converges 
uniformly on the interval. O 


Theorem 5.4.12 ([Mattuck], Theorem 11.5B, [Rudin], Theorem 7.17). Let ul) (t) be a 
sequence of functions with continuous derivatives on an interval [r, s]. Suppose that the 
series )~ u(t) converges toa function f(t) and also that the series of derivatives > u’ (fp) 
converges uniformly to a function g(f), on the interval. Then f is differentiable on the 
interval, and its derivative is g. Oo 
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Proof of Theorem 5.4.4(a). We denote the i, j-entry of a matrix A by (A);; here. So (AB);; 
stands for the entry of the product matrix AB, and (A*);; for the entry of the kth power AF. 
With this notation, the i, j-entry of e4 is the sum of the series 


Oy (A), (BD 
; 2! : « 3! : + 


To prove that the series for the es converges absolutely and uniformly, we need to 
show that the entries of the powers A* do not grow too quickly. 


(5.4.13) (4) = Oy + 


We denote by ||A| the maximum absolute value of the entries of a matrix A, the smallest 
real number such that 


(5.4.14) \(A)ij| < |All for alli, j. 


Its basic property is this: 


Lemma 5.4.15 Let A and B be complex n Xn matrices. Then |AB| < n|A| |B], and for all 
k > 0, JAR sn TANF. 


Proof We estimate the size of the 7, j-entry of AB: 


n n 
|(AB)ij| =| )o(A)iv(B)vj] < D°|(A)iv||(),;| < 2IAI BI. 
v=1 v=1 
The second inequality follows by induction from the first one. Oo 


We now estimate the exponential series: Let a be a positive real number such that 
n|A| <a. The lemma tells us that \(A*);)| < ak (with one 7 to spare). So 


(5.4.16) ei] = [Di] 41 Mul + 5 al 5 |u| + 
a. a 
<1 pe 7 + 7 +> 31 +: 
The ratio test shows that the last series converges (to e@ of course). Theorem 5.4.11 shows 
that the series for e4 converges absolutely and uniformly for all A with n||A] < a. O 


Proof of Theorem 5.4.4(b),(c). We use a trick to shorten the proofs. That is to begin by 
differentiating the series for e'4+#, assuming that A and B are commuting nXn matrices. 
The derivative of tA + B is A, and 


tAa+B tA 2 
os + sae + B) i: 
1! 2! 


Using the product rule (5.3.6), we see that, for k > 0, the derivative of the term of degree k 
of this series is 


(5.4.17) efAtB _ 


k 
5(45**) - (a Suara ACA +t), 
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Since AB = BA, we can pull the A in the middle out to the left: 


k k— k- 
(5.4.18) a(n) =p, CAPD _ {Ara 


dt k! ki - (k —1)! 


This is the product of the matrix A and the term of degree k — 1 of the exponential series. 
So term-by-term differentiation of (5.4.17) yields the series for Ae‘4t4. 


To justify term-by-term differentiation, we apply Theorem 5.4.4(a). The theorem shows 
that for given A and B, the exponential series e'4+? converges uniformly on any interval 
r <t < 5s. Moreover, the series of derivatives converges uniformly to Ae’4+?, By Theorem 
5.4.12, the derivative of e’4+8 can be computed term by term, so it is true that 


d 
& fAtB _ jgtAtB 
at 


for any pair A, B of matrices that commute. Taking B = 0 proves Theorem 5.4.4(b). 


Next, we copy the method used in the proof of Theorem 5.4.9. We differentiate the 
product e~!4e/4+4 again assuming that A and B commute. As in (5.4.10), we find that 


E (ette'4+?) = (-Ae 4) (e448) ait (e-'4) (ae'4+#) —0. 


Therefore e~'4e'4+8 — C, where C is a constant matrix. Setting t = 0 shows that e? = C. 
Setting B = 0 shows that e~“4 = (e'4)7!. Then (e4)1e'4+8 = e , Setting t = 1 shows that 
eAtB — ee This proves Theorem 5.4.4(c). QO 


We will use the remarkable properties of the matrix exponential again, in Chapter 9. 


| have not thought it necessary to undertake the labour 
of a formal proof of the theorem in the general case. 


—Arthur Cayley! 


EXERCISES 


Section 1 Orthogonal Matrices and Rotations 


1.1. Determine the matrices that represent the following rotations of R*: 
(a) angle 0, the axis e2, (b) angle 27/3, axis contains the vector (1,1, 1)‘, (c) angle 2/2, 
axis contains the vector (1, 1, 0). 


1.2. What are the complex eigenvalues of the matrix A that represents a rotation of R* through 
the angle 6 about a pole u? 


1.3. Is On isomorphic to the product group SO, X {+1}? 
1.4. Describe geometrically the action of an orthogonal 3 <3 matrix with determinant -1. 
larthur Cayley, one of the mathematicians for whom the Cayley-Hamilton Theorem is named, stated that 


theorem for n Xn matrices in one of his papers, and then checked the 2X2 case (see (5.2.4)). He closed his 
discussion of the theorem with the sentence quoted here. 
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1.5. Let A be a3 X3 orthogonal matrix with det A = 1, whose angle of rotation is different from 
Oor wz, andlet M=A-— A’. 


(a) Show that M has rank 2, and that a nonzero vector X in the nullspace of M is an 
eigenvector of A with eigenvalue 1. 


(b) Find such an eigenvector explicitly in terms of the entries of the matrix A. 


Section 2 Using Continuity 


2.1. Use the Cayley-Hamilton Theorem to express A7! in terms of A, (detA)7!, and the 
coefficients of the characteristic polynomial. Verify your expression in the 2 X2 case. 


2.2. Let A be mXm and B be nXn complex matrices, and consider the linear operator T on 
the space C” *” of all complex matrices defined by 7(M) = AMB. 


(a) Show how to construct an eigenvector for T out of a pair of column vectors X, Y, where 
X is an eigenvector for A and Y is an eigenvector for B'. 


(b) Determine the eigenvalues of T in terms of those of A and B. 
(c) Determine the trace of this operator. 


2.3. Let A be an n Xn complex matrix. 


(a) Consider the linear operator T defined on the space C”*” of all complex n Xn matrices 
by the rule 7(M) = AM — MA. Prove that the rank of this operator is at most n? — n. 


(b) Determine the eigenvalues of 7 in terms of the eigenvalues A1,..., An of A. 


2.4. Let A and B be diagonalizable complex matrices. Prove that there is an invertible matrix P 
such that P"! AP and P~!BP are both diagonal if and only if AB = BA. 
Section 3 Systems of Differential Equations 
3.1. Prove the product rule for differentiation of matrix-valued functions. 
3.2. Let A(é) and B(t) be differentiable matrix-valued functions of t. Compute 


Bak ae Bese 
AGA ), (b) qe ), (€) Gao Bit). 


3.3. Solve the equation ae = AX for the following matrices A: 


2 i ac 4 12 3 001 
(a) E |.) & i]. 00 4|,@]1 0 Oo}. 
§ O.4t 010 


3.4. Let A and B be constant matrices, with A invertible. Solve the inhomogeneous differential 


(a) 


dX d. 
equation ae AX + B in terms of the solutions to the equation = = AX. 


Section 4 The Matrix Exponential 
4.1. Compute e4 for the following matrices A: 


0 
b dni 2m 0 -b 1 0 
(a) i |. om |e E At (a) li AI 8 ee 


152 


4.2. 
4.3. 


44. 


45. 


4.6. 


4.7. 


4.8. 
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Prove the formula e'™° 4 = det (e+). 


Let X bean eigenvector of an n Xn matrix A, with eigenvalue A. 


(a) Prove thatif.A is invertible then X is an eigenvector for A_!, with eigenvalue A7}. 
(b) Prove that X is an eigenvector for e4, with eigenvalue e, 


Let A and B be commuting matrices. To prove that e4+? = e4e3, one can begin by 


expanding the two sides into double sums whose terms are multiples of A’ B/. Prove that 
the two double sums one obtains are the same. 


dX 
Solve the differential equation et AX when A is the given matrix: 


1 
2 0 0 
(a) E ah w | Ae ji. 


For an nXn matrix A, define sin A and cosA by using the Taylor’s series expansions for 
sinx and cos x. 


(a) Prove that these series converge for all A. 
(b) Prove that sin(tA) is a differentiable function of t and that g sin(tA) = Acos(tA). 


Discuss the range of validity of the following identities: 


(a) cos? A+ sin? A = I, 

(b) e!4 =cosA +isinA, 

(c) sin(A+ B) =sinAcosB+cosAsinB, 

(d) e2tiA =I, 
d(e4) 

(e) Wo 


Let P, By, and B be n Xn matrices, with P invertible. Prove that if By converges to B, then 
P”1B,P converges to P"' BP. 


dA 
eA® Th’ when A (ft) is a differentiable matrix-valued function of t. 


Miscellaneous Problems 


M.1. 
M.2. 
M.3. 


M.4. 


M.5. 


Determine the group O, (Z) of orthogonal matrices with integer entries. 
Prove the Cayley-Hamilton Theorem using Jordan form. 


Let A be an nXn complex matrix. Prove that if trace A* = 0 for all k > 0, then A is 
nilpotent. 


Let A be acomplex n Xn matrix all of whose eigenvalues have absolute value less than 1. 
Prove that the series / + A + A2 +--- converges to (J — A)"}. 
The Fibonacci numbers 0, 1, 1, 2, 3, 5, 8,..., are defined by the recursive relations 


fn = fn-1 + fn-2, with the initial conditions fo = 0, f; = 1. This recursive relation can be 


written in matrix form as E | oe = ies : 


M.7. 


M.8. 
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Pi 1-a\? 
(a) Prove the formula f, = a (42) - (*) | , where a = J5. 


(b) Suppose that a sequence a, is defined by the relation a, = 4(an-1 + An—2). Compute 
the limit of the sequence a, in terms of dp, a}. 


. (an integral operator) The space C of continuous functions f(u) on the interval [0, 1] is one 


of many infinite-dimensional analogues of R”, and continuous functions A(u, v) on the 
square 0 < u, v < 1 are infinite-dimensional analogues of matrices. The integral 


1 
A-f= Alu, v) f(v)du 


is analogous to multiplication of a matrix and a vector. (To visualize this, rotate the unit 
square in the u, v-plane and the interval [0,1] by 90° in the clockwise direction.) The 
response of a bridge to a variable load could, with suitable assumptions, be represented 
by such an integral. For this, f would represent the load along the bridge, and then A - f 
would compute the vertical deflection of the bridge caused by that load. 

This problem treats the integral as a linear operator. For the function A = u ~ v, 
determine the image of the operator explicitly. Determine its nonzero eigenvalues, and 
describe its kernel in terms of the vanishing of some integrals. Do the same for the function 
A=wtov' 

Let A be a 2X2 complex matrix with distinct eigenvalues, and let X be an indeterminate 
2X2 matrix. How many solutions to the matrix equation X? = A can there be? 


Find a geometric way to determine the axis of rotation for the composition of two three- 
dimensional rotations. 


CHAPTER 6 


Symmetry 


L‘algébre n‘est qu’une géométrie écrite; 
la géométrie n'est qu’une algébre figurée. 


—Sophie Germain 


Symmetry provides some of the most appealing applications of groups. Groups were 
invented to analyze symmetries of certain algebraic structures, field extensions (Chapter 16), 
and because symmetry is a common phenomenon, it is one of the two main ways in which 
group theory is applied. The other is through group representations, which are discussed in 
Chapter 10. The symmetries of plane figures, which we study in the first sections, provide a 
rich source of examples and a background for the general concept of a group operation that 
is introduced in Section 6.7. 

We allow free use of geometric reasoning. Carrying the arguments back to the axioms 
of geometry will be left for another occasion. 


6.1 SYMMETRY OF PLANE FIGURES 


Symmetries of plane figures are usually classified into the types shown below: 


K © 


(6.1.1) Bilateral Symmetry. 
(6.1.2) Rotational Symmetry. 
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ALLE 


(6.1.3) Translational Symmetry. 


Figures such as these are supposed to extend indefinitely in both directions. There is also a 
fourth type of symmetry, though its name, glide symmetry, may be less familiar: 


ee 


(6.1.4) Glide Symmetry. 


Figures such as the wallpaper pattern shown below may have two independent translational 
symmetries, 


(6.1.5) 


and other combinations of symmetries may occur. The star has bilateral as well as rotational 
symmetry. In the figure below, translational and rotational symmetry are combined: 


Ausiary hrory heatay 


Another example: 


(6.1.7) 


156 Chapter 6 Symmetry 


A rigid motion of the plane is called an isometry, and if an isometry carries a subset 
F of the plane to itself, it is called a symmetry of F. The set of all symmetries of F forms a 
subgroup of the group of all isometries of the plane: If # and m’' carry F to F, then so does 
the composed map mm’, and so on. This is the group of symmetries of F. 

Figure 6.1.3 has infinite cyclic groups of symmetry that are generated by the translation 
t that carries the figure one unit to the left. 


Ga 420 FU Laer 
Figure 6.1.7 has symmetries in addition to translations. 


6.2 ISOMETRIES 


The distance between points of R” is the length |u — v| of the vector u — v. An isometry of 
n-dimensional space R” is a distance-preserving map f from R” to itself, a map such that, 
for all wu and vin R”, 


(6.2.1) | f(u) — f(v)| = |u - vj. 
An isometry will map a figure to a congruent figure. 


Examples 6.2.2 


(a) Orthogonal linear operators are isometries. 


Because an orthogonal operator ¢ is linear, p(u) — ¢(v) = g(u —v), 80 |p(u) —G(v)| = 
|p(u — v)|, and because ¢ is orthogonal, it preserves dot products and therefore lengths, 
so |g(u — v)| = lu — vI. 

(b) Translation tg by a vector a, the map defined by ¢#g(x) = x +a, isan isometry. 


Translations are not linear operators because they don’t send 0 to 0, except of course 
for translation by the zero vector, which is the identity map. 


(c) The composition of isometries is an isometry. O 


Theorem 6.2.3 The following conditions on a map g:R” — R” are equivalent: 


(a) gis an isometry that fixes the origin: p(0) = 0, 
(b) ¢ preserves dot products: (g(v) - g(w)) = (v- w), forall v and w, 
(c) gis an orthogonal linear operator. 


We have seen that (c) implies (a). The neat proof of the implication (b) > (c) that we 
present next was found a few years ago by Sharon Hollander, when she was a student in an 
MIT algebra class. 


Lemma 6.2.4 Let x and y be points of R”. If the three dot products (x - x), (x- y), and 
(y- y) are equal, then x = y. 
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Proof. Suppose that (x - x) = (x- y) = (y- y). Then 
(@— y)-(x- y)) = @-x) -2x-y) + (yy) =0. 
The-length of x — y is zero, and therefore x = y. 0 


Proof of Theorem 6.2.3, (b) => (c): Let g be a map that preserves dot product. Then it will 
be orthogonal, provided that it is a linear operator (5.1.12). To prove that g is a linear 
operator, we must show that g(u + v) = y(u) + g(v) and that g(cv) = cg(v), for all wu and 
v and all scalars c. 

Given x in R”, we’ll use the symbol x’ to stand for g(x). We also introduce the symbol 
w for the sum, writing w = u + v. Then the relation g(u + v) = g(u) + gv) that is to be 
shown becomes w’ = u’ + v’. 

We substitute x = w’ and y = uv’ + uv’ into Lemma 6.2.4. To show that w’ = u’ + v’, it 
suffices to show that the three dot products 


(w'-w'), (w’-(u’+v’)), and ((u’+v’)-(u’+v’)) 
are equal. We expand the second and third dot products. It suffices to show that 
(w’-w') = (w'-u)4(w’-v) = (W’-w’) +2’) 4+ (0+ Vv’). 


By hypothesis, g preserves dot products. So we may drop the primes: (w’- w’) = (w- w), 
etc. Then it suffices to show that 


(6.2.5) (w-w) = (w-u)+(w-v) = (u-u)+2(u-v) + (v-v). 


Now whereas w’ = wu’ + v’ is to be shown, w = u + v is true by definition. So we may 
substitute u + v for w. Then (6.2.5) becomes true. 
To prove that p(cv) = cgy(v), we write u = cv, and we must show that uv’ = cv’. The 


proof is analogous to the one we have just given. O 


Proof of Theorem 6.2.3, (a) => (b): Let g be an isometry that fixes the origin. With the 
prime notation, the distance-preserving property of g reads 


(6.2.6) (u'—v'’)-(u’ —v’)) =(u—v)-(u—v)), 


for all u and v in R”. We substitute v = 0. Since 0’ = 0, (u’- u’) = (u- x). Similarly, 
(v’ - v') = (v- v). Now (b) follows when we expand (6.2.6) and cancel (u - u) and (v- v) 
from the two sides of the equation. 0 


Corollary 6.2.7 Every isometry f of R” is the composition of an orthogonal linear operator 
and a translation. More precisely, if f is an isometry and if f(0) = a, then f = tag, where 
tq is a translation and g is an orthogonal linear operator. This expression for f is unique. 


Proof. Let f be an isometry, leta = f(0), andletg =t,f.Thentag = f. The corollary 
amounts to the assertion that g is an orthogonal linear operator. Since @ is the composition 
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of the isometries t_, and f, it is an isometry. Also, (0) = tg f(0) = t-a(a) = 0, so ¢ fixes 
the origin. Theorem 6.2.3 shows that g is an orthogonal linear operator. The expression 
F = ta is unique because, since y(0) = 0, we must have a= f(0), andtheng=t,f. O 


To work with the expressions fgg for isometries, we need to determine the product 
(the composition) of two such expressions. We know that the composition gy of orthogonal 
operators is an orthogonal operator. The other rules are: 


(6.2.8) tath=tayp and Qtg=tg¢%, where a’ = (a). 
We verify the last relation: gtg(x) = g(x + a) = G(x) + G(a) = G(x) +.’ = ty G(X). 


Corollary 6.2.9 The set of all isometries of R” forms a group that we denote by Mn, with 
composition of functions as its law of composition. 


Proof. The composition of isometries is an isometry, and the inverse of an isometry is an 
isometry too, because orthogonal operators and translations are invertible, and if f = tag, 
then fo! = gir} = gy !t-q. This is a composition of isometries. 0 


Note: It isn’t very easy to verify, directly from the definition, that an isometry is invertible. 


The Homomorphism M, —> On 


There is an important map 7:M, — On, defined by dropping the translation part of an 
isometry f. We write f (uniquely) in the form f = tag, and define m(f) = ¢. 


Proposition 6.2.10 The map 7 is a surjective homomorphism. Its kernel is the set T = {ty} 
of translations, which is a normal subgroup of My. 


Proof. \t is obvious that z is surjective, and once we show that z is a homomorphism, it 
will be obvious that 7 is its kernel, hence that T is a normal subgroup. We must show that 
if f and g are isometries, then 7( fg) = 2(f)z(g). Say that f = tag and g = tpy, so that 
m(f) = gand z(g) = wv. Then gt, = ty y, where b’ = g(b) and fg = tagtyw = tary ow. 
So m( fg) = gw = n(f)r(g). ; QO 


Change of Coordinates 


Let P denote an n-dimensional space. The formula tgg for an isometry depends on our 
choice of coordinates, so let’s ask how the formula changes when coordinates are changed. 
We will allow changes by orthogonal matrices and also shifts of the origin by translations. In 
other words, we may change coordinates by any isometry. 

To analyze the effect of such a change, we begin with an isometry f,a point p of P, and 
its image gq = f(p), without reference to coordinates. When we introduce our coordinate 
system, the space P becomes identified with R”, and the points p and q have coordinates, 
say X = (X1,...,Xn)' and y= (y},..., yn)!. Also, the isometry f will have a formula tg 
in terms of the coordinates; let’s call that formula m. The equation g = f(p) translates to 
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y= m(x) © tagv(x)). We want to determine what happens to the coordinate vectors and to 
the formula, when we change coordinates. The analogous computation for change of basis 
in a linear operator gives the clue: m will be changed by conjugation. 

Our change in coordinates will be given by some isometry, let’s denote it by 7 (eta). 
Let the new coordinate vectors of p and qg be x’ and y’. The new formula m’ for f is the one 
such that m’(x’) = y’. We also have the formula 7(x’) = x analogous to the change of basis 
formula PX’ = X (3.5.11). 

We substitute 7(x’) = x and 7(>’) = y into the equation m(x) = y, obtaining my(x’) 
= n(y’), or 1 'mn(x’) = y’. The new formula is the conjugate, as expected: 


(6.2.11) m' =n mn. 


Corollary 6.2.12 The homomorphism 2: M, — O, (6.2.10) does not change when the 
origin is shifted by a translation. 


When the origin is shifted by a translation t, = 7, (6.2.11) reads m’ = t_ymt,. Since 
translations are in the kernel of z and since z is a homomorphism, z(m’) = 2t(m). O 


Orientation 


The determinant of an orthogonal operator gy on R” is +1. The operator is said to be 
orientation- preserving if its determinant is 1 and orientation-reversing if its determinant is 
-1. Similarly, an orientation-preserving (or orientation-reversing) isometry f is one such 
that, when it is written in the form f = tg, the operator ¢ is orientation-preserving (or 
orientation-reversing). An isometry of the plane is orientation-reversing if it interchanges 
front and back of the plane, and orientation-preserving if it maps the front to the front. 

The map 


(6.2.13) o:M, > {+1} 


that sends an orientation-preserving isometry to 1 and an orientation-reversing isometry to 
-1is a group homomorphism. 


6.3 ISOMETRIES OF THE PLANE 


In this section we describe isometries of the plane, both algebraically and geometrically. 

We denote the group of isometries of the plane by M. To compute in this group, we 
choose some special isometries as generators, and we obtain relations among them. The 
relations are somewhat analogous to those that define the symmetric group $3, but because 
M is infinite, there are more of them. 

We choose a coordinate system and use it to identify the plane P with the space R?. 
Then we choose as generators the translations, the rotations about the origin, and the re- 
flection about the e;-axis. We denote the rotation through the angle @ by og, and the 
reflection about the e;-axis by r. These are linear operators whose matrices R and Sp were 
exhibited before (see (5.1.17) and (5.1.16)). 
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(6.3.1) 
1. translation tg by a vector a: tg(x) =x+a= ke + E | 


2. rotation (g by an angle 6 about the origin: pg(x) = fee ee E F 
2 


3. reflection r about the e-axis: r(x) = - Hl E | 
= 2 


We haven’t listed all of the isometries. Rotations about a point other than the origin 
aren’t included, nor are reflections about other lines, or glides. However, every element of 
M is a product of these isometries, so they generate the group. 


Theorem 6.3.2 Let m be an isometry of the plane. Then m = t,$g, or else m = typgr, for 
a uniquely determined vector v and angle 0, possibly zero. 


Proof. Corollary 6.2.7 asserts that any isometry m is written uniquely in the form m = ty 
where ¢ is an orthogonal operator. And the orthogonal linear operators on R? are the 
rotations pg about the origin and the reflections about lines through the origin. The 
reflections have the form pgr (see (5.1.17)). O 
An isometry of the form t,g preserves orientation while ¢,0gr reverses orientation. 

Computation in M can be done with the symbols ¢,,09, and r, using the following rules 
for composing them. The rules can be verified using Formulas 6.3.1 (see also (6.2.8)). 


Poty =ty pe, where v' = pg(v), 
(6.3.3) rty=tyr, | wherev’ =r(v), 
rpg = p-6r. 
tytw=th+w, PePn= Porn, and rr=1. 


The next theorem describes the isometries of the plane geometrically. 


Theorem 6.3.4 Every isometry of the plane has one of the following forms: 
(a) orientation-preserving isometries: 

(i) translation: a map ty that sends p ~ p + v. 

(ii) rotation: rotation of the plane through a nonzero angle 8 about some point. 
(b) orientation-reversing isometries: 


(i) reflection: a bilateral symmetry about a line @. 
(ii) glide reflection (or glide for short): reflection about a line £, followed by translation 
by a nonzero vector parallel to @. 


The proof of this remarkable theorem is below. One of its consequences is that the 
composition of rotations about two different points is a rotation about a third point, unless it 
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is a translation. This isn’t obvious, but it follows from the theorem, because the composition 
preserves orientation. 

Some compositions are easier to visualize. The composition of rotations through 
angles a and B about the same point is a rotation about that point, through the angle 
a+ B. The composition of translations by the vectors a and b is the translation by their 
sum a+ b. 

The composition of reflections about nonparallel lines £;, 22 is a rotation about the 
intersection point p = €; Q £2. This also follows from the theorem, because the composition 
is orientation-preserving, and it fixes p. The composition of reflections about parallel lines 
is a translation by a vector orthogonal to the lines. 


Proof of Theorem (6.3.4). We consider orientation-preserving isometries first. Let f be an 
isometry that preserves orientation but is not a translation. We must prove that f is a 
rotation about some point. We choose coordinates to write the formula for f as m = tgp@ 
as in (6.3.3). Since m is not a translation, 940. 


Lemma 6.3.5 An isometry f that has the form m = tg(g, with 60, is a rotation through 
the angle @ about a point in the plane. 


Proof To simplify notation, we denote pg by p. To show that f represents a rotation with 
angle 0 about some point p, we change coordinates by a translation tp. We hope to choose 
p so that the new formula for the isometry f becomes m’ = p. If so, then f will be rotation 
with angle @ about the point p. 

The rule for change of coordinates is t,(x’) = x, and therefore the new formula for f is 
m' =f, mtp = t- ptaPtp (6.2.11). We use the rules (6.3.3): ptp = typ, where p’ = p(p). 
Then if b =—p+a+ p’=a+ :e(p) — p, we will have m’ = thp. We wish to choose p such 
thatb=0. 

Let J denote the identity operator, and let c = cos@ and s = sin@. The matrix of the 
linear operator J ~ pis 


l-c 5s 
(6.3.6) o | : 


Its determinant is 2 — 2c = 2 — 2cos@. The determinant isn’t zero unless cos 8 = 1, and this 
happens only when @ = 0. Since 640, the equation (J — ~) p = a has a unique solution for 
p. The equation can be solved explicitly when needed. O 


The point p is the fixed point of the isometry tg(9, anditcan be found geometrically, 
as illustrated below. The line & passes through the origin and is perpendicular to the vector 
a. The sector with angle 6@ is situated so as to be bisected by £, and the fixed point p is 
determined by inserting the vector a into the sector, as shown. 
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0 


(6.3.7) The fixed point of the isometry t, (9. 


Tocomplete the proof of Theorem 6.3.4,we show that an orientation-reversing isometry 
m = taper is a glide or a reflection. To do this, we change coordinates. The isometry pgr 
is a reflection about a line £9 through the origin. We may as well rotate coordinates so that 
&q becomes the horizontal axis. In the new coordinate system, the reflection becomes our 
standard reflection r, and the translation tg remains a translation, though the coordinates of 
the vector a will have changed. Let’s use the same symbol a for this new vector. In the new 
coordinate system, the isometry becomes m = fgr. It acts as 


This isometry is the glide obtained by reflection about the line £ : {x2 = $a}, followed by 
translation by the vector ae. If aj = 0, m is a reflection. 
This completes the proof of Theorem 6.3.4. O 


Corollary 6.3.8 The glide line of the isometry tgpgr is parallel to the line of reflection 
of per. a) 


The isometries that fix the origin are the orthogonal linear operators, so when 
coordinates are chosen, the orthogonal group O2 becomes a subgroup of the group of 
isometries M@. We may also consider the subgroup of M of isometries that fix a point of the 
plane other than the origin. The relationship of this group with the orthogonal group is given 
in the next proposition. 


Proposition 6.3.9 Assume that coordinates in the plane have been chosen, so that the ortho- 
gonal group O, becomes the subgroup of M of isometries that fix the origin. Then the group 
of isometries that fix a point p of the plane is the conjugate subgroup tp Oot; 


Proof. Ifanisometry m fixes p, then tm tp fixes the origin: t/mtpo = tim p= ip p=9o. 
Conversely, if m fixes o, then tpmt,| fixes p. Oo 
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One can visualize the rotation about a point p this way: First translate by t-, to move p to 
the origin, then rotate about the origin, then translate back to p. 

We go back to the homomorphism 1: M > Qy) that was defined in (6.2.10). The 
discussion above shows this: 


Proposition 6.3.10 Let p be a point of the plane, and let pg, denote rotation through the 
angle 6 about p. Then 2(9,») = pg. Similarly, if rg is reflection about a line £ or a glide 
with glide line £ that is parallel to the x-axis, then (rg) = r. O 


Points and Vectors 


In most of this book, there is no convincing reason to distinguish a point p of the plane 
P = R’ from the vector that goes from the origin 0 to p, which is often written as op in 
calculus books. However, when working with isometries, it is best to maintain the distinction. 
So we introduce another copy of the plane, we call it V, and we think of its elements as 
translation vectors. Translation by a vector v in V acts on a point p of Pas t,(p) = p+u. 
It shifts every point of the plane by v. 

Both V and P are planes. The difference between them becomes apparent only when 
we change coordinates. Suppose that we shift coordinates in P by a translation: 7 = ty. The 
rule for changing coordinates is n(p’) = p, or p' + w = p. At the same time, an isometry 
m changes to m' = nimn = t-ymty (6.2.11). If we apply this rule with m = 4, then 
m' = t-wtyty = ty. The points of P get new coordinates, but the translation vectors are 
unchanged. 

On the other hand, if we change coordinates by an orthogonal operator ¢y, then 
- p(p’) = p, and if m = ty, then m’ = g !typ = ty, where v’ = gy! v. So gu’ = v. The effect 
of change of coordinates by an orthogonal operator is the same on P as on V. 

.The only difference between P and V is that the origin in P needn’t be fixed, whereas 
the zero vector is picked out as the origin in V. 

Orthogonal operators act on V, but they don’t act on P unless the origin is chosen. 


6.4 FINITE GROUPS OF ORTHOGONAL OPERATORS ON THE PLANE 


Theorem 6.4.1 Let G be a finite subgroup of the orthogonal group O2. There is an integer 
n such that G is one of the following groups: 


(a) Cy: the cyclic group of order n generated by the rotation pg, where 0 = 277/n. 
(b) Dp: the dihedral group of order 2n generated by two elements: the rotation pg, where 
@ = 21/n, and a reflection 7’ about a line 2 through the origin. 


We will take a moment to describe the dihedral group D,, before proving the theorem. 
This group depends on the line of reflection, but if we choose coordinates so that @ 
becomes the horizontal axis, the group will contain our standard reflection r, the one whose 
matrix is 


(6.4.2) h 4| 
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Then if we also write ¢ for pg, the 2n elements of the group will be the n powers p! of 
p and the n products p'r. The rule for commuting p and r is 


[alle a-(e Jt d= + 


where c = cos, s = sin @, and 6 = 27/n. 
To conform with a more customary notation for groups, we denote the rotation 02,,/, 
by x, and the reflection r by y. 


Proposition 6.4.3 The dihedral group Dy has order 2n. It is generated by two elements x 
and y that satisfy the relations 


x*=1, y=1, yxax ly, 
The elements of Dy», are 
1,x,x7,..., x77): Vie RV Ve ce gO 


Using the first two relations (6.4.3), the third one can be rewritten in various ways. It is 
equivalent to 


(6.4.4) xyxy=1, andalsoto yx =x"ly, 


When n = 3, the relations are the same as for the symmetric group $3 (2.2.6). 
Corollary 6.4.5 The dihedral group D3 and the symmetric group S3 are isomorphic. O 


For n > 3, the dihedral and symmetric groups are not isomorphic, because Dy, has order 2n, 
while S, has order n!. 


When n > 3, the elements of the dihedral group Dy are the orthogonal operators that 
carry a regular n-sided polygon A to itself —- the group of symmetries of A. This is easy to 
see, and it follows from the theorem: A regular n-gon is carried to itself by the rotation by 
22/n about its center, and also by some reflections. Theorem 6.4.1 identifies the group of all 
symmetries as Dn. 


The dihedral groups D,, D2 are too small to be symmetry groups of an n-gon in the 
usual sense. D, is the group {1,7} of two elements. So it is a cyclic group, as is C2. But 
the element r of D, is a reflection, while the element different from the identity in C2 is the 
rotation with angle 7. The group D2 contains the four elements {1, 9, r, pr), where p is 
the rotation with angle a and pr is the reflection about the vertical axis. This group 
is isomorphic to the Klein four group. 

If we like, we can think of D; and Dy as groups of symmetry of the 1-gon and 2-gon: 


1-gon. 2-gon. 
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We begin the proof of Theorem 6.4.1 now. A subgroup I of the additive group R* of 
real numbers is called discrete if there is a (small) positive real number € such that every 
nonzero element c of I has absolute value > e. 


Lemma6.4.6 Let I be a discrete subgroup of R*. Then either C = {0}, or I is the set Za of 
integer multiples of a positive real number a. 


Proof This is very similar to the proof of Theorem 2.3.3, that a nonzero subgroup of Zt has 
the form Zn. 

If a and b are distinct elements of I, then since TI" is a group, a — b is in T, and 
la — b| > e. Distinct elements of I" are separated by a distance at least €. Since only finitely 
many elements separated by € can fit into any bounded interval, a bounded interval contains 
finitely many elements of I. 

Suppose that ’4{0}. Then I" contains a nonzero element b, and since it is a group, 
contains -b as well. So it contains a positive element, say a’. We choose the smallest positive 
element a in I. We can do this because we only need to choose the smallest element of the 
finite subset of I" in the interval 0 < x < a’. 

We show that I’ = Za. Since aisin T and I is a group, Za CT. Let b be an element of 
l’. Then b = ra for somereal number r. We take out the integer part of r, writing r = m+ro 
with m an integer and 0 < rp <1. Since I" is a group, b’ = b— ma is in T and b’ = roa. Then 
0 < b’ <a. Since a is the smallest positive element in I’, b’ must be zero. So b = ma, which 
is in Za. This shows that I C Za, and therefore that [ = Za. oO 


Proof of Theorem (6.4.1). Let G be a finite subgroup of O2. We want to show that G is Cy, 
or Dn. We remember that the elements of O2 are the rotations og and the reflections pgr. 


Case I: All elements of G are rotations. 


We must prove that G is cyclic. Let T’ be the set of real numbers @ such that py is in 
G. Then I is a subgroup of the additive group R*, and it contains 27. Since G is finite, I" is 
discrete. So [ has the form Za. Then G consists of the rotations through integer multiples 
of the angle a. Since 27 is in I’, it is an integer multiple of a. Therefore @ = 27/n for some 
integer n, and G = Cy. 


Case 2: G contains a reflection. 


We adjust our coordinates so that the standard reflection r isin G. Let H denote the 
subgroup consisting of the rotations that are elements of G. We apply what has been proved 
in Case 1 to conclude that His the cyclic group generated by $y, for some angle 0 = 277/n. 
Then the 2n products 6 and per, for 0 < k <n —1, are in G, so G contains the dihedral 
group D,. We claim that G = Dn, and to show this we take any element g of G. Then g 
is either a rotation or a reflection. If g is a rotation, then by definition of H, g is in H. The 
elements of H are also in Dyn, so g is in Dn. If g is a reflection, we write it in the form pgr 
for some rotation fq. Since r is in G, so is the product gr = fg. Therefore Og is a power of 
fe, and again, gisin Dy. Oo 
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Theorem 6.4.7 Fixed Point Theorem. Let G be a finite group of isometries of the plane. 
There is a point in the plane that is fixed by every element of G, a point p such that g(p) = p 
for all gin G. 


Proof. This is a nice geometric argument. Let s be any point in the plane, and let S be the 
set of points that are the images of s under the various isometries in G. So each element s’ 
of S has the form s’ = g(s) for some g in G. This set is called the orbit of s for the action 
of G. The element s is in the orbit because the identity element 1 isin G, and s = 1(s). A 
typical orbit for the case that G is the group of symmetries of a regular pentagon is depicted 
below, together with the fixed point p of the operation. 

Any element of G will permute the orbit S. In other words, if s’ is in S and h is in G, 
then A(s’) isin S: Say that s’ = g(s), with gin G. Since G is a group, fg is in G. Then 
hg(s) is in S and is equal to h(s’). 


*?p s 


We list the elements of S arbitrarily, writing S = {5,,..., 5,}. The fixed point we are 
looking for is the centroid, or center of gravity of the orbit, defined as 


(6.4.8) p=i(si+---+5n), 
where the right side is computed by vector addition, using an arbitrary coordinate system in 


the plane. 


Lemma 6.4.9 Isometries carry centroids to centroids: Let S = {s1,..., 5,} be a finite set of 
points of the plane, and let p be its centroid, as defined by (6.4.8). Let m be an isometry. Let 
m(p) = p’ and m(s;) = s;. Then p’ is the centroid of the set S’ = {s},..., 5;}. Oo 


The fact that the centroid of our set S is a fixed point follows. An element g of G permutes 
the orbit S. It sends S to S and therefore it sends p to p. Oo 


Proof of Lemma 6.4.9 This can be deduced by physical reasoning. It can be shown alge- 
braically too. To do so, it suffices to look separately at the cases m = fg and m = ¢, where 
gy is an orthogonal operator. Any isometry is obtained from such isometries by composition. 


Case I: m = tg iS a translation. Then Ss; = 5;+aand p’ = p+az. Itis true that 
p= pta=F((s+a)+---+ Gn tay) = RS, +++ +55). 
Case 2:m = gis a linear operator. Then 


p= 9(p) = (A (5. ++» + 5n)) =F. (G(51) ++ + Gn) = AGH +5). 
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By combining Theorems 6.4.1 and 6.4.7 one obtains a description of the symmetry 
groups of bounded figures in the plane. 


Corollary 6.4.10 Let G be a finite subgroup of the group M of isometries of the plane. 
If coordinates are chosen suitably, G becomes one of the groups C, or D, described in 
Theorem 6.4.1. O 


6.5 DISCRETE GROUPS OF ISOMETRIES 


In this section we discuss groups of symmetries of unbounded figures such asthe one depicted 
in Figure 6.1.5. What I call the kaleidoscope principle can be used to construct a figure with 
a given group of symmetries. You have probably looked through a kaleidoscope. One sees 
a sector at the end of the tube, whose sides are bounded by two mirrors that run the length 
of the tube and are placed at an angle 9, such as 0 = 7/6. One also sees the reflection of the 
sector in each mirror, and then one sees the reflection of the reflection, and so on. There are 
usually some bits of colored glass in the sector, whose reflections form a pattern. 

There is a group involved. In the plane at the end of the kaleidoscope tube, let 2; and 
£5 be the lines that bound the sector formed by the mirrors. The group is a dihedral group, 
generated by the reflections r; about £;. The product r;rz of these reflections preserves 
orientation and fixes the point of intersection of the two lines, so it is a rotation. Its angle of 
rotation is +20. 

One can use the same principle with any subgroup G of M. We won’t give precise 
reasoning to show this, but the method can be made precise. We start with a random figure 
R in the plane. Every element g of our group G will move R to a new position, call it gR. 
The figure F is the union of all the figures gR. An element h of the group sends gR tohgR, 
which is also a part of F, so it sends F to itself. If R is sufficiently random, G will be the 
group of symmetries of F. As we know from the kaleidoscope, the figure F is often very 
attractive. The result of applying this procedure when G is the group of symmetries of a 
regular pentagon is shown below. 


eS 


Of course many different figures have the same group of symmetry. But it is interesting and 
instructive to describe the groups. We are going to present a rough classification, which will 
be refined in the exercises. 


Some subgroups of M are too wild to have a reasonable geometry. For instance, if the 
angle 9 at which the mirrors in a kaleidoscope are placed were not a rational multiple of 277, 
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there would be infinitely many distinct reflections of the sector. We need to rule this 
possibility out. 


Definition 6.5.1 A group G of isometries of the plane P is discrete if it does not contain 
arbitrarily small translations or rotations. More precisely, G is discrete if there is a positive 
real number € so that: 


(i) if an element of G is the translation by a nonzero vector a, then the length of a is at 
least €: |a| > €, and 

(ii) if an element of G is the rotation through a nonzero angle 9 about some point of the 
plane, then the absolute value of @ is at least €: |O| > e. 


Note: Since the translation vectors and the rotation angles form different sets, it might seem 
more appropriate to have separate lower bounds for them. However, in this definition we 
don’t care about the best bounds for the vectors and the angles, so we choose € small enough 
to take care of both at the same time. O 


The translations and rotations are all of the orientation-preserving isometries (6.3.4), 
and the conditions apply to all of them. We don’t impose a condition on the orientation- 
reversing isometries. If m is a glide with nonzero glide vector v, then m? is the translation 
tay. So a lower bound on the translation vectors determines a bound for the glide vectors too. 


There are three main tools for analyzing a discrete group G: 
(6.5.2) ¢ the translation group L, a subgroup of the group V of translation vectors, 


* the point group G, a subgroup of the orthogonal group Op, 


* anoperation of Gon L. 
The Translation Group 
The translation group L of G is the set of vectors v such that the translation ¢, isin G. 
(6.5.3) L={veV|t)€G). 


Since tytw = ty+w and is = t-», L is a subgroup of the additive group V™ ofall translation 
vectors. The bound € on translations in G bounds the lengths of the vectors in L: 


(6.5.4) Every nonzero vector v in L has length |v| > €. 


e A subgroup L of one of the additive groups V* or R"* that satisfies condition (6.5.4) for 
some € > 0 is called a discrete subgroup. (This is the definition made before for R*.) 


A subgroup Z is discrete if and only if the distance between distinct vectors a and b 
of L is at least €. This is true because the distance is the length of b — a, and b —aisin L 
because L is a group. If (6.5.4) holds, then |b — a| > €. O 
Theorem 6.5.5 Every discrete subgroup L of V+ or of R’* is one of the following: 


(a) the zero group: L = {0}. 


Section 6.5 Discrete Groups of Isometries 169 


(b) the set of integer multiples of a nonzero vector a: 
L=Za={ma|meZ), or 
(c) the set of integer combinations of two linearly independent vectors a and b: 
L=Za+Zb={ma+nb|m,neZ). 


Groups of the third type listed above are called Jattices, and the generating set (a, b) is called 
a lattice basis. 


(6.5.6) A Lattice 


Lemma 6.5.7 Let L be a discrete subgroup of V* or R**. 


(a) A bounded region of the plane contains only finitely many points of L. 
(b) If Z is not the trivial group, it contains a nonzero vector of minimal length. 


Proof. (a) Since the elements of L are separated by a distance at least €, a small square can 
contain at most one point of L. A region of the plane is bounded if it is contained in some 
large rectangle. We can cover any rectangle by finitely many small squares, each of which 
contains at most one point of L. 


(b) We say that a vector v is a nonzero vector of minimal length of L if L contains no shorter 
nonzero vector. To show that such a vector exists, we use the hypothesis that L is not the 
trivial group. There is some nonzero vector a in L. Then the disk of radius |a| about 
the origin is a bounded region that contains a and finitely many other nonzero points of L. 
Some of those points will have minimal length. O 


Given a basis B = (u, w) of R?, we let I1(B) denote the parallelogram with vertices 
0,u, w, u+ w. It consists of the linear combinations ru + sw withO <r<land0<s <1. 
We also denote by [I’(B) the region obtained from [I1(B) by deleting the two edges 
[u,u+ wl] and [w,u + wi). It consists of the linear combinations ru + sw withO < r <1and 
O0<s<l1. 


Lemma 6.5.8 Let B = (u, w) be a basis of R’, and let L be the lattice of integer combinations 
of B. Every vector v in R* can be written uniquely in the form v = x + vo, with x in L and 
vo in T1’(B). 
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Proof. Since B is a basis, every vector is a linear combination ru + sw, with real coefficieints 
rand s. We take out their integer parts, writing r = m+ro ands =n+5o, with m, n integers 
and 0 < ro, Sg < 1. Then v = x + vg, where xX = mu+nvisin L and vp = rou + Sow is in 
IT’(B). There is just one way to do this. O 


Proof of Theorem 6.5.5 Itis enough to consider a discrete subgroup L of R?*. The case that 
L is the zero group is included in the list. If Z #{0}, there are two possibilities: 


Case I: All vectors in L lie on a line @ through the origin. 


Then L is a subgroup of the additive group of £+, which is isomorphic to R*+. Lemma 6.4.6 
shows that L has the form Za. 


Case 2: The elements of Z do not lie on a line. 


In this case, L contains independent vectors a’ and b’, and then B’ = (a’, b’) is a basis of R?. 
We must show that there is a lattice basis for L. 

We first consider the line 2 spanned by a’. The subgroup LN £ of £* is discrete, and a’ 
isn’t zero. So by what has been proved in Case 1, L has the form Za for some vector a. We 
adjust coordinates and rescale so that a becomes the vector (1, 0)‘. 

Next, we replace b’ = (b/,, b,)' by -b’ if necessary, so that b5, becomes positive. We 
look for a vector b = (bj, bz)‘ in L with b2 positive, and otherwise as small as possible. A 
priori, we have infinitely many elements to inspect. However, since b’ is in L, we only need 
to inspect the elements b such that 0 < bz < b}. Moreover, we may add a multiple of a to 
b, so we may also assume that 0 < b; < 1. When this is done, b will be in a bounded region 
that contains finitely many elements of L. We look through this finite set to find the required 
element b, and we show that B = (a, b) is a lattice basis for L. 

Let L = Za+Zb. Then LC L. We must show that every element of L is in L, and 
according to Lemma 6.5.8, applied to the lattice Z, it is enough to show that the only element 
of L in the region IT1’(B) is the zero vector. Let c = (c1, c)’ be a point of L in that region, 
so that 0 < c; < 1 andO < c < by. Since b2 was chosen minimal, cz = 0, and c is on the line 
£. Then c is an integer multiple of a, and since 0 < cy} < 1,c =0. O 


The Point Group 


We turn now to the second tool for analyzing a discrete group of isometries. We choose 
coordinates, and go back to the homomorphism 2: M — QO) whose kernel is the group T 
of translations (6.3.10). When we restrict this homomorphism to a discrete subgroup G, we 
obtain a homomorphism 


(6.5.9) Wg: G—> Oz. 


The point group G is the image of G in the orthogonal group O3. 

It is important to make a clear distinction between elements of the group G and 
those of its point group G. So to avoid confusion, we will put bars over symbols when they 
represent elements of G. For g in G, 8 will be an orthogonal operator. 

By definition, a rotation (, is in G if G contains an element of the form fg pg, and this 
is a rotation through the same angle 6 about some point of the plane (6.3.5). The inverse 
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image in G of an element (g of G consists of the elements of G that are rotations through 
the angle 9 about various points of the plane. 

Similarly, let o denote the line of reflection of pgr. As we have noted before, its angle 
with the e1-axis is 50 (5.1.17). The point group G contains for if there is an element tg er in 
G, and tgperisa reflection or a glide reflection along a line parallel to @ (6.3.8). The inverse 
image of gr consists of all of the elements of G that are reflections or glides along lines 

parallel to 2. To sum up: 


¢ The point group G records the angles of rotation and the slopes of the glide lines and the 
lines of reflection, of elements of G. 


Proposition 6.5.10 A discrete subgroup G of QO} is finite, and is therefore either cyclic or 
dihedral. 


Proof. Since G contains no small rotations, the set [ of real numbers 6 such that pg is inG 
is a discrete subgroup of the additive group Rt that contains 277. Lemma 6.4.6 tells us that 
l has the form ZO, where 6 = 277/n for some integer n. At this point, the proof of Theorem 
6.4.1 carries over. O 


The Crystallographic Restriction 


If the translation group of a discrete group of isometries G is the trivial group, the restriction 
of x to G will be injective. In this case G will be isomorphic to its point group G, and will 
be cyclic or dihedral. The next proposition is our third tool for analyzing infinite discrete 
groups. It relates the point group to the translation group. 

Unless an origin is chosen, the orthogonal group O2 doesn’t operate on the plane P. 
But it does operate on the space V of translation vectors. 


Proposition 6.5.11 Let G be a discrete subgroup of M. Let a be an element of its translation 
group L, and let g be an element of its point group G. Then 9(q) isin L. 


We can restate this proposition by saying that the elements of G map L to itself. So G is 
contained in the group of symmetries of L, when L is regarded as a figure in the plane V. 


Proof of Proposition 6.5.11 Let a and g be elements of L and G, respectively, let g be the 

image of g in G, and let a’ = g(a). We will show that tg is the conjugate gtgg™!. This will 

show that tg’ is in G, and therefore that a’ is in L. We write g = tpg. Then ¢ is in O2 and 
= ©. Soa’ = G(a). Using the formulas (6.2.8), we find: 


Stag) = (ty@)ta(Y 't-») = tyta VE 't-» = ta. Oo 


Note: It is important to understand that the group G does not operate on its translation 
group L. Indeed, it makes no sense to ask whether G operates on L, because the elements 
of G are isometries of the plane P, while L is a subset of V. Unless an origin is fixed, P is not 
the same as V. If we fix the origin in P, we can identify P with V. Then the question makes 
sense. We may ask: Is there a point of P so that with that point as the origin, the elements 
of G carry L to itself? Sometimes yes, sometimes no. That depends on the group. 0 
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The next theorem describes the point groups that can occur when the translation group 
L is not trivial. 


Theorem 6.5.12 Crystallographic Restriction. Let L be a discrete subgroup of V+ or R2+, 
and let H C O2 be a subgroup of the group of symmetries of L. Suppose that L is not the 
trivial group. Then 


(a) every rotation in H has order 1, 2, 3, 4, or 6, and 
(b) H is one of the groups C,, or D,, andn = 1,2, 3, 4, or 6. 


In particular, rotations of order 5 are ruled out. There is no wallpaper pattern with five-fold 
rotational symmetry (‘“‘Quasi-periodic’” patterns with five-fold symmetry do exist. See, for 
example, [Senechal].) 


Proof of the Crystallographic Restriction We prove (a). Part (b) follows from (a) and from 
Theorem 6.4.1. Let o be a rotation in H with angle 0, and let a be a nonzero vector in L 
of minimal length. Since H operates on L, p(a) is also in L. Then b = p(a) —- ais in L 
too, and since a has a minimal length, |b| > |a|. Looking at the figure below, one sees that 
|b| < |a| when 0 < 277 /6. So we must have 9 > 27r/6. It follows that the group H is discrete, 
hence finite, and that p has order < 6. 


p(a) b 


poe) 


a 


The case that 9 = 27/5 can be ruled out too, because for that angle, the element b’ = 
p*(a) + ais shorter than a: 


p*(a) 
0 a oO 


6.6 PLANE CRYSTALLOGRAPHIC GROUPS 


We go back to our discrete group of isometries G C M. We have seen that when L is the 
trivial group, G is cyclic or dihedral. The discrete groups G such that L is infinite cyclic 
(6.5.5)(b) are the symmetry groups of frieze patterns such as those shown in (6.1.3), (6.1.4). 
We leave the classification of those groups as an exercise. 

When L is a lattice, G is called a two-dimensional crystallographic group. These 
crystallographic groups are the symmetry groups of two-dimensionalcrystals such as graphite. 
We imagine a crystal to be infinitely large. Then the fact that the molecules are arranged 
regularly implies that they form an array having two independent translational symmetries. 
A wallpaper pattern also repeats itself in two different directions — once along the strips of 
paper because the pattern is printed using a roller, and a second time because strips of paper 
are glued to the wall side by side. The crystallographic restriction limits the possibilities and 


Section 6.6 Plane Crystallographic Groups 173 


allows one to classify crystallographic groups into 17 types. Representative patterns with the 
various types of symmetry are illustrated in Figure (6.6.2). 

The point group G and the translation group L do not determine the group G 
completely. Things are complicated by the fact that a reflection in G needn’t be the image 
of a reflection in G. It may be represented in G only by glides, as in the brick pattern 
that is illustrated below. This pattern (my favorite) is relatively subtle because its group of 
symmetries doesn’t contain a reflection. It has rotational symmetries with angle 7 about 
the center of each brick. All of these rotations represent the same element $, of the 
point group G. There are no nontrivial rotational symmetries with angles other than 0 
and zr. The pattern also has glide symmetry along the dashed line drawn in the figure, so 
G= D, = {1, Pr: r, Pr). 


One can determine the point group of a pattern fairly easily, in two steps: One looks 
first for rotational symmetries. They are usually relatively easy to find. A rotation Og in the 
point group G is represented by a rotation with the same angle in the group G of symmetries 
of the pattern. When the rotational symmetries have been found, one will know the integer 
n such that the point group is C, or Dy. Then to distinguish D, from Cy, one looks to see 


if the pattern has reflection or glide symmetry. If it does, G = Dn, andifnot,G = Cy. 


Plane Crystallographic Groups with a Fourfold Rotation in the Point Group 


As an example of the methods used to classify discrete groups of isometries, we analyze 
groups whose point groups are C4 or D4. 

Let G be such a group, let # denote the rotation with angle 77/2 in G, and let L be the 
lattice of G, the set of vectors v such that ft, is in G. 


Lemma 6.6.2 The lattice L is square. 


Proof. We choose a nonzero vector a in L of minimal length. The point group operates on 
L,so p(a) = bis in L and is orthogonal to a. We claim that (a, b) is a lattice basis for L. 
Suppose not. Then according to Lemma 6.5.8, there will be a point of L in the region 
IT’ consisting of the points 7,;a +12b with 0 < rj <1. Such a point w will be at a distance less 
than |a| from one of the four vertices 0, a, b, a + b of the square. Call that vertex v. Then 
v — wis also in L, and |v — w| < |a|. This contradicts the choice of a. oO 


We choose coordinates and rescale so that a and b become the standard basis vectors 
e, and e7. Then L becomes the lattice of vectors with integer coordinates, and I’ becomes 
the set of vectors (s, t)' with 0 < s <1 and 0 < t < 1. This determines coordinates in the 
plane P up to a translation. 


Symmetry 
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Sample Patterns for the 17 Plane Crystallographic Groups. 


(6.6.2) 


Section 6.6 Plane Crystallographic Groups 175 


The orthogonal operators on V that send L to itself form the dihedral group D4 
generated by the rotation through the angle 7/2 and the standard reflection 7. Our 
assumption is that 9 is in G. If 7 is also in G, then G is the dihedral group D4. If not, G is 
the cyclic group C4. We describe the group G when G is C, first. Let g be an clement of G 
whose image in G is the rotation p. Then g is a rotation through the angle 7/2 about some 
point p in the plane. We translate coordinates in the plane P so that the point p becomes 
the origin. In this coordinate system, G contains the rotation 9 = Pz /2 about the origin. 


Proposition 6.6.3 Let G be a plane crystallographic group whose point group G is the cyclic 
group C4. With coordinates chosen so that L is the lattice of points with integer coordinates, 
and so that 9 = ?,/2 is an element of G, the group G consists of the products t,o’, with v 
inZLandO<i<4: 

Ge={tp'|ve L}. 


Proof. Let G' denote the set of elements of the form t,o! with v in L. We must show that 
G' = G. By definition of L, ty is in G, and also ¢ is in G. So typ! is in G, and therefore G’ 
is a subset of G. 

__. To prove the opposite inclusion, let g be any clement of G. Since the point group 
G is C4, every element of G preserves orientation. So g has the form g = t, Pa for some 
translation vector uw and some angle a. The i image of this element in ine point group iS Py, 
soa isa multiple of 7/2, and po = p! for some i. Since pis in G, go™! = t, is in G and u is 
in L. Therefore g is in G’. O 


We now consider the case that the point group G is D4. 


Proposition 6.6.4 Let G be a plane crystallographic group whose point group G is the 
dihedral group D4. Let coordinates be chosen so that L is the lattice of points with integer 
coordinates and so that 0 = (7/2 is an element of G. Also, let c denote the vector (3, 5) 
There are two possibilities: 


(a) The elements of G are the products t,g where v is in L and g isin Da, 
G= {typ'|v eL}uU {typir|y él}, or 


(b) the elements of G are products t,@, with g in D4. If g is a rotation, then x is in L, and 
if g is a reflection, then x is in the coset c + L: 


G= {typ'|v e Ly} u {ty p'r|u ec+L}, 


Proof. Let H be the subset of orientation-preserving isometries in G. This is a subgroup 
of G whose lattice of translations is L, and which contains 9. So its point group is C4. 
Proposition 6.6.3 tells us that H consists of the elements t, p', with vin L. 

The point group also contains the reflection 7. We choose an element gin G such that 
g =7. It willhave the form g = t,r for some vector u, but we don’t know whether or not u 
is in L. Analyzing this case will require a bit of fiddling. Say that u = (p, q)’. 

Wecan multiply g onthe left by a translation r, in G (ie., vin L), to move u into the 
region IT’ of points with 0 < p,q < 1. Let’s suppose this has been done. 
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We compute with g = t,7, using the formulas (6.3.3): 


g =tyrtyr =ty+ry and (gp)? = (turp)? = lu+rpu- 


These are elements of G, so u +ru = (2p,0)', and u+rpu = (p —q,q — p)'‘ are in the 
lattice L. They are vectors with integer coordinates. Since 0 < p, g < 1 and 2: is an integer, 
p is either 0 or 7 Since p — q is also an integer, g = Oif p = O andq = 4 if p= 5. So 
there are only two possibilities for u: Either u = (0, 0)', oru=c= (3, s)h, In the first case, 
g =r,so Gcontains a reflection. This is case (a) of the proposition. The second possibility is 
case (b). O 


6.7. ABSTRACT SYMMETRY: GROUP OPERATIONS 


The concept of symmetry can be applied to things other than geometric figures. Complex 
conjugation (a+bi) ~» (a—bi), for instance, may be thought of as a symmetry of the complex 
numbers. Since complex conjugation is compatible with addition and multiplication, it is 
called an automorphism of the field C. Geometrically, it is the bilateral symmetry of the 
complex plane about the real axis, but the statement that it is an automorphism refers to its 
algebraic structure. The field F = Q[/2] whose elements are the real numbers of the form 
a+bJ2, witha and b rational, also has an automorphism, one that sends a+ bV/2~~a—bv2. 
This isn’t a geometric symmetry. Another example of abstract “bilateral” symmetry is given 
by acyclic group H of order 3. It has an automorphism that interchanges the two elements 
different from the identity. 

The set of automorphisms of an algebraic structure X, such as a group or a field, forms 
a group, the law of composition being composition of maps. Each automorphism should be 
thought of as a symmetry of X, in the sense that it is a permutation of the elements of X that 
is compatible with its algebraic structure. But the structure in this case is algebraic instead of 
geometric. 

So the words “‘automorphism” and ‘‘symmetry” are more or less synonymous, except 
that “automorphism” is used to describe a permutation of a set that preserves an algebraic 
structure, while ‘“‘symmetry” often, though not always, refers to a permutation that preserves 
a geometric structure. 

Both automorphisms and symmetries are special cases of the more general concept of 
a group operation. An operation of a group G ona set S is a rule for combining an element 
g of G and an element s of S to get another element of S. In other words, it is a map 
GxXS — S. For the moment we denote the result of applying this law to elements g and s 
by gxs. An operation is required to satisfy the following axioms: 


Example 6.7.1 


(a) 1x*s = 5 for all sin S. (Here 1 is the identity of G.) 
(b) associative law: (gg')*s = g*(g'xs), for all g and g’inG and alls in S. 


We usually omit the asterisk, and write the operation multiplicatively, as g, s~» gs. With 
multiplicative notation, the axioms are 1s = s and (gg’)s = g(g’s). 
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Examples of sets on which a group operates can be found manywhere,' and most often, 
it will be clear that the axioms for an operation hold. The group M of isometries of the plane 
operates on the set of points of the planc. It also operates on the set of lines in the plane and 
on the set of triangles in the plane. The symmetric group S, operates on the set of indices 
{1,2,...,m}. 

The reason that such a law is called an operation is this: If we fix an element g of G 
but let s vary in S, then left multiplication by g (or operation of g) defines a map from S to 
itself. We denote this map, which describes the way the element g operates, by my: 


(6.7.2) mg:S > S 


is the map defined by r7g(s) = gs. It is a permutation of S, a bijective map, because it has 


the inverse function m gvi: multiplication by gh. 


e Given an operation of a group G oma set S, an element s of S will be sent to various other 
elements by the group operation. We collect together those elements, obtaining a subset 
called the orbit Os of s: 


(6.7.3) Os; = {s’ € S| s’ = gs for some g in G}. 


When the group M of isometries of the plane operates on the set S of triangles in the 
plane, the orbit O, of a given triangle A is the set of all triangles congruent to A. Another 
orbit was introduced when we proved the existence of a fixed point for the operation of a 
finite group on the plane (6.4.7). 

The orbits for a group action are equivalence classes for the equivalence relation 


(6.7.4) s~s’ if s’ = gs, forsome gin G. 


Soif s~=s”’, that is, if s’ = gs for some g in G, then the orbits of s and of s’ are the same. 
Since they are are equivalence classes: 


(6.7.5) The orbits partition the set S. 


The group operates independently on each orbit. For example, the set of triangles of 
the plane is partitioned into congruence classes, and an isometry permutes each congruence 
class separately. 


If S consists of just one orbit, the operation of G is called transitive. This means 
that every element of S is carried to every other one by some element of the group. The 
symmetric group S, operates transitively on the set of indices {1,...,m}. The group M of 
isometries of the plane operates transitively on the set of points of the plane, and it operates 
transitively on the set of lines. It does not operate transitively on the set of triangles. 


¢ The stabilizer of an element s of S is the set of group elements that leave s fixed. It is a 
subgroup of G that we often denote by Gs: 


(6.7.6) Gs ={geG| gs=s}. 


' While writing a book, the mathematician Masayoshi Nagata decided that the English language needed this 
word; then he actually found it in a dictionary. 
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For instance, in the operation of the group M on the set of points of the plane, the stabilizer 
of the origin is isomorphic to the group O2 of orthogonal operators. The stabilizer of the 
index n for the operation of the symmetric group S, is isomorphic to the subgroup S,,_; 
of permutations of {1,..., m-1}. Or, if S is the set of triangles in the plane, the stabilizer 
of a particular equilateral triangle A is its group of symmetries, a subgroup of M that is 
isomorphic to the dihedral group D3. 


Note: It is important to be clear about the following distinction: When we say that an isometry 
m Stabilizes a triangle A, we don’t mean that m fixes the points of A. The only isometry that 
fixes every point of a triangle is the identity. We mean that in permuting the set of triangles, 
m carries A to itself. O 


Just as the kernel K of a group homomorphism g:G > G’ tells us when two elements 
x and y of G have the same image, namely, if x71 y is in K, the stabilizer G, of an element s 
of S tells us when two elements x and y of G act in the same way on s. 


Proposition 6.7.7 Let S be aset on which a group G operates, let s be an element of S, and 
let 7 be the stabilizer of s. 


(a) If a and b are elements of G, then as = bs if and only if a bis in H, and this is true if 
and only if D is in the coset aH. 
(b) Suppose that as = s’. The stabilizer H’ of s’ is a conjugate subgroup: 


H' =aHa'!={geG| g=aha" forsomeh in H}. 


Proof. (a) as = bs if and only if s = a~'bs. 


(b) If g is in aHa',say g = aha™ with h in H, then gs’ = (aha™!)(as) = ahs = as =’, 
so g stabilizes s’. This shows that aHa™! C H’. Since s = a™'s’, we can reverse the roles 
of s and s’, to conclude that a! H’a C H, which implies that H’ C aHa"'. Therefore 
H' =aHa"'. oO 


Note: Part (b) of the proposition explains a phenomenon that we have seen several times 
before: When as = s’, a group element g fixes s if and only if aga‘! fixes s’. 


6.8 THE OPERATION ON COSETS 


Let H be a subgroup of a group G. As we know, the left cosets aH partition G. We often 
denote the set of left cosets of H in G by G/H, copying this from the notation used for 
quotient groups when the subgroup is normal (2.12.1), and we use the bracket notation [C] 
for a coset C, when it is considered as an element of the set G/H. 

The set of cosets G/ #7 is not a group unless H is a normal subgroup. However, 


¢ The group G operates on G/H in a natural way. 


The operation is quite obvious: If g is an element of the group, and C is a coset, then 
g[C] is defined to be the coset [gC], where gC = {gc | c € C}. Thus if [C] = [aH], then 
g[C] = [ga]. The next proposition is elementary. 
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Proposition 6.8.1 Let H be a subgroup of a group G. 


(a) The operation of G on the set G/H of cosets is transitive. 
(b) The stabilizer of the coset [ H] is the subgroup H. O 


Note the distinction once more: Multiplication by an element h of H does not act trivially 
on the elements of the coset H, but it sends the coset [ H] to itself. 

Please work carefully through the next example. Let G be the symmetric group $3 
with its usual presentation, and let H be the cyclic subgroup {1, y}. Its left cosets are 


(6.8.2) Cj =H=({l,y}, Co=xH =({x, xy}, C3 =x°H = {x’, xy} 


(see (2.8.4)), and G operates on the set of cosets G/ H = {[Cy], [C2], [C3]}. The elements 
x and y operate in the same way as on the set of indices {1, 2, 3}: 


(6.8.3) mx < (123) and my © (23). 


For instance, yC2 = {yx, yxy) = (xy, x*} = C3. 
The next proposition, sometimes called the orbit-stabilizer theorem, shows how an 
arbitrary group operation can be described in terms of operations on cosets. 


Proposition 6.8.4 Let S be a set on which a group G operates, and let s be an element 
of S. Let H and Os be the stabilizer and orbit of s, respectively. There is a bijective map 
€:G/H-Osy defined by [aH]~ as. This map is compatible with the operations of the 
group: €(g[C]) = ge([C]) for every coset C and every element g in G. 


For example, the dihedral group Ds operates on the vertices of a regular pentagon. 
Let V denote the set of vertices, and let H be the stabilizer of a particular vertex. There is 
a bijective map Ds5/H - JY. In the operation of the group M of isometries of the plane P, 
the orbit of a point is the set of all points of P. The stabilizer of the origin is the group O2 of 
orthogonal operators, and there is a bijective map M/O2 — P. Similarly, if H denotes the 
stabilizer of a line and if £ denotes the set of all lines in the plane, there is a bijective map 
M/H— C. 


Proof of Proposition (6.8.4). It is clear that the map e€ defined in the statement of the 
proposition will be compatible with the operation of the group, if it exists. Symbolically, € 
simply replaces H by the symbol s. What is not so clear is that the rule [gH] ~» gs defines a 
map at all. Since many symbols gH represent the same coset, we must show that if a and b 
are group elements, and if the cosets aH and bH are equal, then as and bs are equal too. 
Suppose that aH = bH. Then a bis in H (2.8.5). Since H is the stabilizer of s, a” 'bs = s, 
and therefore as = bs. Our definition is legitimate, and reading this reasoning backward, 
we also see that € is an injective map. Since € carries [g H] to gs, which can be an arbitrary 
element of Os, € is surjective as well as injective. Oo 


Note: The reasoning that we made to define the map € occurs frequently. Suppose that a set 
S is presented as the set of equivalence classes of an equivalence relation on a set S, and let 
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:S > S be the map that sends an element s to its equivalence class s. A common way to 
define a map € from S to another set T is this: Given x in S, one chooses an element s in S 
such that x = S, and defines €(x) in terms of s. Then one must show, as we did above, that 
the definition doesn’t depend on the choice of the element s whose equivalence class is x, 
but only on x. This process is referred to as showing that the map is well defined. Oo 


6.9 THE COUNTING FORMULA 


Let H be a subgroup of a finite group G. As we know, all cosets of H in G have the same 
number of elements, and with the notation G/H for the set of cosets, the order |G/A| is 
what is called the index [G: H] of H in G. The Counting Formula 2.8.8 becomes 


(6.9.1) |G| =|A||G/A\. 
There is a similar formula for an orbit of any group operation: 


Proposition 6.9.2 Counting Formula. Let S be a finite set on which a group G operates, and 
let Gs and O, be the stabilizer and orbit of an element s of S. Then 


|G| = IGs| |Os|, or 
(order of G) = (order of stabilizer)-(order of orbit). 


This follows from (6.9.1) and Proposition (6.8.4). O 


Thus the order of the orbit is equal to the index of the stabilizer, 
(6.9.3) |Os| =[G:Gs], 


and it divides the order of the group. There is one such formula for every element s of S. 


Another formula uses the partition of the set S into orbits to count its elements. We 
number the orbits that make up S arbitrarily, as O,..., Ox. Then 


(6.9.4) [S} = [Ox + |O2| +--+ + | Oxl. 
Formulas 6.9.2 and 6.9.4 have many applications. 


Examples 6.9.5 (a) The group G of rotational symmetries of a regular dodecahedron 
operates transitively on the set F of its faces. The stabilizer G f of a particular face f 
is the group of rotations by multiples of 27r/5 about the center of f; its order is 5. The 
dodecahedron has 12 faces. Formula 6.9.2 reads 60 = 5: 12, so the order of G is 60. Or, G 
operates transitively on the set V of vertices. The stabilizer Gy of a vertex v is the group of 
order 3 of rotations by multiples of 277/3 about that vertex. A dodecahedron has 20 vertices, 
so 60 = 3-20, which checks. There is a similar computation for edges: G operates transitively 
on the set of edges, and the stabilizer of an edge e contains the identity and a rotation by 7 
about the center of €. So |G¢| = 2. Since 60 = 2 - 30, a dodecahedron has 30 edges. 
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(b) We may also restrict an operation of a group G to a subgroup H. By restriction, an 
operation of G on a set S defines an operation of H on S, and this operation leads to more 
numerical relations. The H-orbit of an element s will be contained in the G-orbit of s, so a 
single G-orbit will be partitioned into H-orbits. 

For example, let F be the set of 12 faces of the dodecahedron, and let H be the 
stabilizer of a particular face f, a cyclic group of order 5. The order of any H-orbit is 
either 1 or 5. So when we partition the set F of 12 faces into H-orbits, we must find two 
orbits of order 1. We do: H fixes f and it fixes the face opposite to f. The remaining faces 
make two orbits of order 5. Formula 6.9.4 for the operation of the group H on the set 
of faces is 12 = 1+1+5+5. Or, let K denote the stabilizer of a vertex, a cyclic group 
of order 3. We may also partition the set F into K-orbits. In this case Formula 6.9.4 is 
12=3+34+3+3. oO. 


6.10 OPERATIONS ON SUBSETS 


Suppose that a group G operates on a set S. If U is a subset of S of order r, 
(6.10.1) gU = {gu|ueU} 


is another subset of order r. This allows us to define an operation of G on the set of subsets 
of order r of S. The axioms for an operation are verified easily. 

For instance, let O be the octahedra! group of 24 rotations of a cube, and let F 
be the set of six faces of the cube. Then O also operates on the subsets of F of order 
two, that is, on unordered pairs of faces. There are 15 pairs, and they form two orbits: 
F = {pairs of opposite faces} U {pairs of adjacent faces}. These orbits have orders 3 and 12, 
respectively. 

The stabilizer of a subset U is the set of group elements g such that [gU] = [UJ], which 
is to say, gU = U. The stabilizer of a pair of opposite faces has order 8. 

Note this point once more: The stabilizer of U consists of the group elements such that 
gU = U. This means that g permutes the elements within U, that whenever u is in U, gu is 
also in U. 


6.11 PERMUTATION REPRESENTATIONS 


In this section we analyze the various ways in which a group G can operate on aset S. 
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e A permutation representation of a group G is a homomorphism from the group to a 
symmetric group: 


(6.11.1) g:G — Sp. 


Proposition 6.11.2 Let G bea group. There is a bijective correspondence between operations 
of G on the set S = {1 ..., m} and permutation representations G > Sy: 


operations of G permutation 
on S$ representations | ° 


Proof. This is very simple, though it can be confusing when one sees it for the first time. If 
we are given an operation of G on S, we define a permutation representation g by setting 
~(g) = Mg, multiplication by g (6.7.2). The associative property g(hi) = (gh)i shows that 


mg(myi) = g(hi) = (gh)i = mgzi. 


Hence @ is a homomorphism. Conversely, if g is a permutation representation, the same 
formula defines an operation of G on S. O 


For example, the operation of the dihedral group D, on the vertices (11,..., Un) of a 
regular n-gon defines a homomorphism ¢: Dn — Sn. 


Proposition 6.11.2 has nothing to do with the fact that it works with a set of indices. If 
Perm(S) is the group of permutations of an arbitrary set S, we also call a homomorphism 
y:G — Perm(S) a permutation representation of G. 


Corollary 6.11.3 Let Perm(S) denote the group of permutations ofa set S, and let G bea 
group. There is a bijective correspondence between operations of G on S and permutation 
representations g:G —> Perm(S): 


of GonsS G — Perm(S) 


operations ae | homomorphisms 
q 


A permutation representation G — Perm(S) needn’t be injective. If it happens to be 
injective, one says that the corresponding operation is faithful. To be faithful, an operation 
must have the property that 7g, multiplication by g, is not the identity map unless g = 1: 


(6.11.4) An operation is faithful if it has this property: 
The only element g of G such that gs = s for every s in S is the identity. 


The operation of the group of isometries M on the set S of equilateral triangles in the plane 
is faithful, because the only isometry that carries every equilateral triangle to itself is the 
identity. 
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Permutation representations g:G —> Perm(S) are rarely surjective because the order 
of Perm(S) tends to be very large. But one case is given in the next example. 


Example 6.11.5 The group GL2(F2) of invertible matrices with mod 2 coefficients is 
isomorphic to the symmetric group S3. 

We denote the field F, by F and the group GL2(F2) by G. The space F? of column 
vectors consists of four vectors: 


¢-[pe-[he-L}eeLih 


The group G operates on the set of three nonzero vectors S = {e1, e2, €1 + é2}, and this 
gives us a permutation representation @:G — S3. The identity is the only matrix that fixes 
both e; and e2, so the operation of G on S is faithful, and ¢ is injective. The columns of an 
invertible matrix must be an ordered pair of distinct elements of S. There are six such pairs, 
so |G| = 6. Since S3 also has order six g is an isomorphism. | 


6.12 FINITE SUBGROUPS OF THE ROTATION GROUP 


In this section, we apply the Counting Formula to classify the finite subgroups of SO3, the 
group of rotations of R?. As happens with finite groups of isometries of the plane, all of them 
are symmetry groups of familiar figures. 


Theorem 6.12.1 A finite subgroup of SO3 is one of the following groups: 


Cx: the cyclic group of rotations by multiples of 277/k about a line, with & arbitrary; 
D,: the dihedral group of symmetries of a regular k-gon, with k arbitrary; 
T: the tetrahedral group of 12 rotational symmetries of a tetrahedron; 
O: the octahedral group of 24 rotational symmetries of a cube or an octahedron; 
I: the icosahedral group of 60 rotational symmetries of a dodecahedron or an icosahedron. 


ADOOGE 


Note: The dihedral groups are usually presented as groups of symmetry of a regular polygon 
in the plane, where reflections reverse orientation. However, a reflection of a plane can be 
achieved by a rotation through the angle z in three-dimensional space, and in this way the 
symmetries of a regular polygon can be realized as rotations of R*. The dihedral group Dp, 
can be generated by a rotation x with angle 277/n about the e)-axis and a rotation y with 


184 Chapter6 Symmetry 
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angle z about the e2-axis. With c = cos 27/n and s = sin 2/n, the matrices that represent 
these rotations are 


1 [-1 
(6.12.2) x= ce -s|, and y= 1 ; 
sc -1 O 


Let G be a finite subgroup of SO3, of order N > 1. We’ll call a pole of an element 
g#1 of G a pole of the group. Any rotation of R? except the identity has two poles — the 
intersections of the axis of rotation with the unit sphere S?. So a pole of G is a point on the 
2-sphere that is fixed by a group element g different from 1. 


Example 6.12.3. The group T of rotational symmetries of a tetrahedron A has order 12. Its 
poles are the points of S* that lie above the centers of the faces, the vertices, and the centers 
of the edges. Since A has four faces, four vertices, and six edges, there are 14 poles. 


|poles| = 14 = |faces| + |vertices| + \edges| 


Each of the 11 elements g#1 of T has two spins — two pairs (g, p), where p is a pole of g. 
So there are 22 spins altogether. The stabilizer of a face has order 3. Its two elements #1 
share a pole above the center of a face. Similarly, there are two elements with a pole above 
a vertex, and one element with a pole above the center of an edge. 


lspins| = 22 = 2 |faces| + 2 |vertices| + |edges| 


Let P denote the set of all poles of a finite subgroup G. We will get information about the 
group by counting these poles. As the example shows, the count can be confusing. 


Lemma 6.12.4 The set P of poles of G is a union of G-orbits. So G operates on P. 


Proof. Let p be a pole, say the pole of an element g #1 in G, let h be another element of G, 
and let g = hp. We have to show that q is a pole, meaning that q is fixed by some element 
g’ of G other than the identity. The required element is hgh™!. This element is not equal to 
1 because g¥ 1, andhgh"!g =hgp =hp=q. QO 


The stabilizer Gp of a pole p is the group of all of the rotations about p that are in G. 
It is a cyclic group, generated by the rotation of smallest positive angle 9. We'll denote its 
order by rp. Then 6 = 277/rp. 
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Since p is a pole, the stabilizer Gp contains an element besides 1, so rp > 1. The set of 
elements of G with pole p is the stabilizer G p, with the identity element omitted. So there 
are rp — 1 group elements that have p as pole. Every group element g except one has two 
poles. Since |G| = N, there are 2N — 2 spins. This gives us the relation 


(6.12.5) Yi (rp -1) =2(N - 1). 


peP 


We collect terms to simplify the left side of this equation: Let np denote the order of the 
orbit O> of p. By the Counting Formula (6.9.2), 


(6.12.6) rpNp = N. 


Iftwo poles pand p’ are in the same orbit, their orbits are equal, son p = np, and therefore 
rp =1p'. We label the various orbits arbitrarily, say as O;, O2,... Ox, and we letn; = np 
and r; = rp for p in Oj, so that njr; = N. Since the orbit O; contains n; elements, there are 
n, terms equal to r; — 1 on the left side of (6.12.5). We collect those terms together. This 
gives us the equation 


k 
yoni ~1)=2N-2., 


i=l 


We divide both sides by N to get a famous formula: 
1 2 
12. > Roya pg. 
(6.12.7) (1 ~) Ti 


This may not look like a promising tool, but in fact it tells us a great deal. The right side is 
between 1 and 2, while each term on the left is at least 5. It follows that there can be at most 
three orbits. 

The rest of the classification is made by listing the possibilities: 


One orbit: 1 — + =2- z. This is impossible, because 1 — x < 1, while 2 — 3 >1. 


Two orbits: (1 — x) t= a) =2- §, that is, i +e =H 


Because r; divides N, this equation holds only when r; = r2 = N, and then ny = n2 = 1. 
There are two poles p; and po, both fixed by every element of the group. So G is the cyclic 
group Cy of rotations whose axis of rotation is the line @ through p; and pp. 


Three orbits: A — =) +(1- *) +(1- *) =2—- z. 


This is the most interesting case. Since ra is positive, the formula implies that 
1 1 
(6.12.8) —+—4+-—>1. 


We arrange the r; in increasing order. Then 7; = 2: If all 7; were at least 3, the left side 
would be < 1. 
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Case 1:1, = rz = 2. The third order r3 = k can be arbitrary, and N = 2k: 
r;=2,2,k; nj=k,k,2; N=2k. 


There is one pair of poles {p, p’} making the orbit O3. Half of the elements of G fix p, and 
the other half interchange p and p’. So the elements of G are rotations about the line 2 
through p and p’, or else they are rotations by 7 about a line perpendicular to £2. The group 
G is the group of rotations fixing a regular k-gon A, the dihedral group Dx. The polygon A 
lies in the plane perpendicular to @, and the vertices and the centers of faces of A correspond 
to the remaining poles. The bilateral symmetries of A in R? have become rotations through 
the angle z in R?. 


Case 2: ry = 2 and 2 < rz < r3. The equation 1/2 + 1/4 +1/4 = 1 rules out the possibility 
that rz > 4. Therefore r2 = 3. Then the equation 1/2 + 1/3 +1/6 = 1 rules out 73 > 6. Only 
three possibilities remain: 


(6.12.9) 


(i) ri =2,3,3; nj =6,4,4; N=12. 
The poles in the orbit O3 are the vertices of a regular tetrahedron, and G is the 
tetrahedral group T of its 12 rotational symmetries. 

(ii) rj = 2, 3,4; nj =12,8,6; N= 24. 
The poles in the orbit O3 are the vertices of a regular octahedron, and G is the 
octahedral group O of its 24 rotational symmetries. 

(iii) r; = 2,3,5; nj =30, 20,12; N= 60. 
The poles in the orbit O3 are the vertices of a regular icosahedron, and G is the 
icosahedral group J of its 60 rotational symmetries. 


In each case, the integers n; are the numbers of edges, faces, and vertices, respectively. 

Intuitively, the poles in an orbit should be the vertices of a regular polyhedron because 
they must be evenly spaced on the sphere. However, this isn’t quite correct, because the 
centers of the edges of a cube, for example, form an orbit, but they do not span a regular 
polyhedron. The figure they span is called a truncated polyhedron. 

We'll verify the assertion of (iii). Let V be the orbit O3 of order twelve. We want to 
show that the poles in this orbit are the vertices of a regular icosahedron. Let p be one of 
the poles in V. Thinking of p as the north pole of the unit sphere gives us an equator and 
a south pole. Let H be the stabilizer of p. Since r3 = 5, this is a cyclic group, generated by 
a rotation x about p with angle 27/5. When we decompose V into H-orbits, we must get 
at least two H-orbits of order 1. These are the north and south poles. The ten other poles 
making up V form two H-orbits of order 5. We write them as {qo, ... , ga} and {qo, .-., 4}, 
where q; = x'qo and qi, = x'qh. By symmetry between the north and south poles, one of 
these H-orbits is in the northern hemisphere and one is in the southern hemisphere, or else 
both are on the equator. Let’s say that the orbit {q;} is in the northern hemisphere or on the 
equator. 
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Let |x, y| denote the spherical distance between points x and y on the unit sphere. We 
note that d = |p, qi| is independent of i = 0,..., 4, because there is an element of H that 
carries go ~~ qj, while fixing p. Similarly, d’ = | p, q;| is independent of i. So as p’ ranges over 
the orbit V the distance |p, p’| takes on only four values 0, d, d’ and z. The values d and d’ 
are taken on five times each, and 0 and z are taken on once. Since G operates transitively 
on V, we will obtain the same four values when p is replaced by any other pole in V. 

We note that d < 2/2 while d’ > 1/2. Because there are five poles in the orbit {q;}, 
the spherical distance |g;, gj+1| is less than 77/2, so it is equal to d, and d < 1/2. Therefore 
that orbit isn’t on the equator. The three poles p, gj, giz; form an equilateral triangle. There 
are five congruent equilateral triangles meeting at p, and therefore five congruent triangles 
meet at each pole. They form the faces of an icosahedron. 


Note: There are just five regular polyhedra. This can be proved by counting the number of 
ways that one can begin to build one by bringing congruent regular polygons together at a 
vertex. One can assemble three, four, or five equilateral triangles, three squares, or three 
regular pentagons. (Six triangles, four squares, or three hexagons glue together into flat 
surfaces.) So there are just five possibilities. But this analysis omits the interesting question 
of existence. Does an icosahedron exist? Of course, we can build one out of cardboard. But 
when we do, the triangles never fit together precisely, and we take it on faith that this is due 
to our imprecision. If we drew the analogous conclusion about the circle of fifths in music, 
we’d be wrong: the circle of fifths almost closes up, but not quite. The best way to be sure 
that the icosahedron exists may be to write down the coordinates of its vertices and check 
the distances. This is Exercise 12.7. O 


Our discussion of the isometries of the plane has analogues for the group of isometries 
of three-space. One can define the notion of a crystallographic group, a discrete subgroup 
whose translation group is a three-dimensional lattice. The crystallographic groups are anal- 
ogous to two-dimensional lattice groups, and crystals form examples of three-dimensional 
configurations having such groups as symmetry. It can be shown that there are 230 types of 
crystallographic groups, analogous to the 17 lattice groups (6.6.2). This is too long a list to 
be useful,.so crystals have been classified more crudely into seven crystal systems. For more 
about this, and for a discussion of the 32 crystallographic point groups, look in a book on 
crystallography, such as [Schwarzenbach]. 


Un bon héritage vaut mieux que le plus joli probleme 
de géomeétrie, parce qu’il tient lieu de méthode 
générale, et sert a resoudre bien des problémes. 


—Gottfried Wilhelm Leibnitz2 


71 learned this quote from V.I. Arnold. |’H6pital had written to Leibniz, apologizing for a long silence, and 
saying that he had been in the country taking care of an inheritance. In his reply, Leibniz told him not to worry, and 
continued with the sentence quoted. 
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EXERCISES 


Section 1 Symmetry of Plane Figures 


LL. Determine all symmetries of Figures 6.1.4, 6.1.6, and 6.1.7. 


Section 3 Isometries of the Plane 
3.1. Verify the rules (6.3.3). 
3.2. Let m be an orientation-reversing isometry. Prove algebraically that m2 is a translation. 
3.3. Prove that a linear operator on R? is a reflection if and only if its eigenvalues are 1 and -1, 
and the eigenvectors with these eigenvalues are orthogonal. 


3.4, Prove that a conjugate of a glide reflection in M isa glide reflection, and that the glide 
vectors have the same length. 
3.5. Write formulas for the isometries (6.3.1) in terms of a complex variable z = x + iy. 
3.6. (a) Let s be the rotation of the plane with angle 2/2 about the point (1, 1)’. Write the 
formula for s as a product tg fg. 


(b) Let s denote reflection of the plane about the vertical axis x = 1. Find an isometry g 
such that grg™! = s, and write s in the form ftgpor. 


Section 4 Finite Groups of Orthogonal Operators on the Plane 


4.1. Write the product x? yx7! y7!x3y° in the form x! y/ in the dihedral group Dn. 
4.2. (a) List all subgroups of the dihedral group D4, and decide which ones are normal. 


(b) List the proper normal subgroups N of the dihedral group Djs, and identify the 
quotient groups Di5/N. 


(c) List the subgroups of Dg that do not contain x°. 

4.3. (a) Compute the left cosets of the subgroup H = {1, x°} in the dihedral group Dy. 
(b) Prove that H is normal and that Do/H is isomorphic to Ds. 
(c) Is Dio isomorphic to Ds x H? 


Section 5 Discrete Groups of Isometries 


5.1. Let £; and £2 be lines through the origin in R* that intersect in an angle 27/n, and let rj be 
the reflection about £;. Prove that 7; and rz generate a dihedral group Dy. 


§.2. What is the crystallographic restriction for a discrete group of isometries whose translation 
group L has the form Za with a#0? 


5.3. How many sublattices of index 3 are contained in a lattice L in R2? 


5.4. Let (a, b) be a lattice basis of a lattice L in R*. Prove that every other lattice basis has the 
form (a’, b') = (a, b) P, where P is a2 X2 integer matrix with determinant +1. 


5.5. Prove that the group of symmetries of the frieze pattern d<4<<<<<] is isomorphic to the 
direct product C2 XC, of acyclic group of order 2 and an infinite cyclic group. 


5.6. Let G be the group of symmetries of the frieze pattern SAA de. Determine 
the point group G of G, and the index in G of its subgroup of translations. 
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5.7. Let N denote the group of isometries of a line R!. Classify discrete subgroups of N, 
identifying those that differ in the choice of origin and unit length on the line. 


*5.8. Let N’ be the group of isometries of an infinite ribbon 


R= {(x,y)|-l<y <1}. 


It can be viewed as a subgroup of the group M. The following elements are in N’: 


ta: (X, y) > (x +4, y) 
8: (x, y) > (-x, y) 

r: (x, y) > (x, -y) 

p: (x, y) > (-x,-y). 


(a) State and prove analogues of (6.3.3) for these isometries. 


(b) A frieze pattern is a pattern on the ribbon that is periodic and whose group of 
symmetries is discrete. Classify the corresponding symmetry groups, identifying those 
that differ in the choice of origin and unit length on the ribbon. Begin by making some 
patterns with different symmetries. Make a careful case analysis when proving your 
results. 


5.9. Let G be a discrete subgroup of M whose translation group is not trivial. Prove that 
there is a point po in the plane that is not fixed by any element of G except the 
identity. 

5.10. Let f and g be rotations of the plane about distinct points, with arbitrary nonzero 
angles of rotation 9 and @. Prove that the group generated by f and g contains a 
translation. 


5.11. If S and S’ are subsets of R” with 5 C S’, then S is dense in S’ if for every element s’ of S’, 
there are elements of S arbitrarily near to s’. 
(a) Prove that a subgroup I of Rt is either dense in R, or else discrete. 


(b) Prove that the subgroup of R* generated by 1 and /2 is dense in R*. 
(c) Let H bea subgroup of the group G of angles. Prove that H is either a cyclic subgroup 
of G or else it is dense in G. 


5.12. Classify discrete subgroups of the additive group R*+. 


Section6 Plane Crystallographic Groups . 


6.1. (a) Determine the point group G for each of the patterns depicted in Figure (6.6.2). 
(b) For which of the patterns can coordinates be chosen so that the group G operates on 
the lattice L? 
6.2. Let G be the group of symmetries of an equilateral triangular lattice L. Determine the 
index in G of the subgroup of translations in G. 
6.3. With each of the patterns shown, determine the point group and find a pattern with the 
same type of symmetry in Table 6.6.2. 
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*6.4, Classify plane crystallographic groups with point group D, = (1, 7}. 
6.5. (a) Prove that if the point group of a two-dimensional crystallographic group G is C¢ or 
Dg, the translation group L is an equilateral triangular lattice. 
(b) Classify those groups. 


*6.6. Prove that symmetry groups of the figures in Figure 6.6.2 exhaust the possibilities. 


Section7 Abstract Symmetry: Group Operations 
7.1. Let G = Dy be the dihedral group of symmetries of the square. 


(a) What is the stabilizer of a vertex? of an edge? 
(b) G operates on the set of two elements consisting of the diagonal lines. What is the 
stabilizer of a diagonal? 


7.2. The group M of isometries of the plane operates on the set of linesin the plane. Determine 
the stabilizer of a line. 

7.3. The symmetric group S3 operates on two sets U and V of order 3. Decompose the product 
set UX V into orbits for the “diagonal action” g(u, v) = (gu, gv), when 


(a) the operations on U and V are transitive, 
(b) the operation on U is transitive, the orbits for the operation on V are {v1} and {v2, v3}. 


7.4, In each of the figures in Exercise 6.3, find the points that have nontrivial stabilizers, and 
identify the stabilizers. 


7.5. Let G be the group of symmetries of a cube, including the orientation-reversing symmetries. 
Describe the elements of G geometrically. 
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7.6. Let G be the group of symmetries of an equilateral triangular prism P, including the 
orientation-reversing symmetries. Determine the stabilizer of one of the rectangular faces 
of P and the order of the group. 


7.7, LetG = GL,(R) operate on the set V = R” by left multiplication. 


(a) Describe the decomposition of V into orbits for this operation. 
(b) What is the stabilizer of e1? 


7.8. Decompose the set C?*? of 2X2 complex matrices into orbits for the following operations 
of GL2(C): (a) left multiplication, (b) conjugation. 


7.9. (a) Let S be the set R”*” of real m Xn matrices, and let G = GL (R) X GL», (R). Prove 
that the rule (P, Q) * A = PAQ™! define an operation of G on S. 
(b) Describe the decomposition of S into G-orbits. 
(c) Assume that m <n. What is the stabilizer of the matrix [J | 0]? 


710. (a 


—_ 


Describe the orbit and the stabilizer of the matrix ; | under conjugation in the 
general linear group GL, (R). 


(b) Interpreting the matrix in G L2 (Fs), find the order of the orbit. 


7.11. Prove that the only subgroup of order 12 of the symmetric group Sq is the alternating 
group Aq. 


Section 8 The Operation on Cosets 


8.1. Does the rule P x A = PAP’ define an operation of GL, on the set of n Xn matrices? 

8.2. What is the stabilizer of the coset [a H] for the operation of G on G/H? 

8.3. Exhibit the bijective map (6.8.4) explicitly, when G is the dihedral group D4 and S is the 
set of vertices of a square. 


8.4. Let H be the stabilizer of the index 1 for the operation of the symmetric group G = S, 
on the set of indices {1,..., mn}. Describe the left cosets of H nm G and the map (6.8.4) in 
this case. 


Section9 The Counting Formula 


9.1. Use the counting formula to determine the orders of the groups of rotational symmetries 
of a cube and of a tetrahedron. 


9.2, Let G be the group of rotational symmetries of a cube, let Gy, Ge, G rf be the stabilizers 
of a vertex v, an edge e, anda face f of the cube, and let V, E, F be the sets of vertices, 
edges, and faces, respectively. Determine the formulas that represent the decomposition 
of each of the three sets into orbits for each of the subgroups. 


9.3. Determine the order of the group of symmetries of a dodecahedron, when orientation- 
reversing symmetries such as reflections in planes are allowed. 

9.4, Identify the group 7” of all symmetries of a regular tetrahedron, including orientation- 
reversing symmetries. 
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9.5. 


9.6. 
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Let F be a section of an J-beam, which one can think of as the product set of the letter 
I and the unit interval. Identify its group of symmetries, orientation-reversing symmetries 
included. 


Identify the group of symmetries of a baseball, taking the seam (but not the stitching) into 
account and allowing orientation-reversing symmetries. 


Section 10 Operations on Subsets 


10.1. 


10.2. 


10.3. 


Determine the orders of the orbits for left multiplication on the set of subsets of order 3 of 
D3. 

Let S bea finite set on which a group G operates transitively, and let U be a subset of S. 
Prove that the subsets gl cover S evenly, that is, that every clement of S is in the same 
number of sets gU. 


Consider the operation of left multiplication by G on the set of its subsets. Let U’ be a 
subset such that the sects gU partition G. Let H he the unique subset in this orbit that 
contains 1. Prove that H is a subgroup of G. 


Section11 Permutation Representations 


11.1. 
11.2. 
11.3. 


11.4. 


11.5. 


116. 


11.7. 


11.8. 


Describe all ways in which 53 can operate on a set of four elements. 
Describe all ways in which the tetrahedral group T can operate on a set of two elements. 


Let S be a set on which a group G operates, and let H be the subset of elements g such 
that gs = s for all sin S. Prove that H is a normal subgroup of G. 


Let G be the dihedral group D4 of symmetries of a square. Is the action of G on the 
vertices a faithful action? on the diagonals? 


A group G operates faithfully on aset S of five elements, and there are two orbits, one of 
order 3 and one of order 2. What are the possible groups? 
Hint: Map G to a product of symmetric groups. 


Let F = F3. There are four one-dimensional subspaces of the space of column vectors 
F*, List them. Left multiplication by an invertible matrix permutes these subspaces. Prove 
that this operation defines ahomomorphism g:GL2(F) — S4. Determine the kernel and 
image of this homomorphism. 


For each of the following groups, find the smallest integer n such that the group has a 
faithful operation on a set of order n: (a) D4, (b) Deo, (c) the quaternion group H. 


Find a bijective correspondence between the multiplicative group Fy, and the set of 
automorphisms of a cyclic group of order p. 


11.9. Three sheets of rectangular paper S,, S2, S3 are made into a stack. Let G be the group of 


all symmetries of this configuration, including symmetries of the individual sheets as well 
as permutations of the set of sheets. Determine the order of G, and the kernel of the map 
G — S; defined by the permutations of the set {51, $2, $3}. 


Section 12 Finite Subgroups of the Rotation Group 


12.1. 


Explain why the groups of symmetries of the dodecahedron and the icosahedron are 
isomorphic. 


12.2. 
12.3. 


12.4. 


12.5. 
12.6. 
12.7, 


*42.8. 
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Describe the orbits of poies for the group of rotations of an octahedron. 


Let O be the group of rotations of a cube, and let S be the set of four diagonal lines 
connecting opposite vertices. Determine the stabilizer of one of the diagonals. 


Let G = O be the group of rotations of a cube, and iet H be the subgroup carrying one of 
the two inscribed tetrahedra to itself (see Exercise 3.4). Prove that H = T. 


Prove that the icosahedral! group has a subgroup of order 10. 

Determine all subgroups of (a) the tetrahedral group, (b) the icosahedral group. 

The 12 points (+1. ta, 0)', (0, +1, ta@)'. (+a,0, + 1)' form the vertices of a regular 
icosahedron if @ > 1 is chosen suitably. Verify this. and determine a. 

Prove the crystallographic restriction for thrce-dimensiona! crystallographic groups: 
A rotational symmetry of a crystal has order 2. 3, 4, or 6. 


Miscellaneous Preblems 


*MLL 


M.2. 


M.3. 


*M.4, 


*M.S. 


Let G be a two-dimensional! crystallographic group such that no element g#1 fixes any 

point of the plane. Prove that G is generated by two translations, or else by one translation 

and one glide. 

(a) Prove that the set Aut G of automorphisms of a group G forms a group, the law of 
composition being composition of functions. 

(b) Prove that the map g:G -» Aut G defined by g~~ (conjugation by g) is a homo- 
morphism, and determine its kernel. 

(c) The automorphisms that are obtained as conjugation by a group element are called 
inner automorphisms. Prove that the set of inner automorphisms, the image of ¢, is a 
normal subgroup of the group Aut G. 


Determine the groups of automorphisms (see Exercise M.2) of the group 

(a) Cg. (b) Ce, (c) C2X C2, (d) Dg, (e) the quaternion group H. 

With coordinates x1,...,X, in R” as usual, the set of points defined by the in- 
equalities -1 < x; < +1, fori = 1,...,”, is an n-dimensional hypercube C,. The 


i-dimensional hypercube is a line segment and the 2-dimensional hypercube is a square. 
The 4-dimensional hypercube has eight face cubes, the 3-dimensional cubes defined by 
(x; = 1} and by {x; = —1}, fori = 1, 2,3, 4, and it has 16 vertices (+1, +1, +1, +1). 

Let Gy, denote the subgroup of the orthogonal group O, of elements that send the 
hypercube to itself, the group of symmetries of C,,, including the orientation-reversing sym- 
metries. Permutations of the coordinates and sign changes are among the 
elements of Gp. 


(a) Use the counting formula and induction to determine the order of the group Gp. 


(b) Describe G,, explicitly, and identify the stabilizer of the vertex (1, ..., 1). Check your 
answer by showing that G2 is isomorphic to the dihedral group D4. 


(a) Find a way to determine the area of one of the hippo heads that make up the first 
pattern in Figure 6.6.2. Do the same for one of the fleurs-de-lys in the pattern at the 
bottom of the figure. 

(b) A fundamental domain D fora plane crystallographic group is a bounded region of the 
plane such that the images gD, g in G, cover the plane exactly once, without overlap. 
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Find two noncongruent fundamental domains for group of symmetries of the hippo 
pattern. Do the same for the fleur-de-lys pattern. 

(c) Prove that if D and D’ are fundamental domains for the same pattern, then D can be 
cut into finitely many pieces and reassembled to form D’. 

(d) Find a formula relating the area of a fundamental domain to the order of the point 
group of the pattern. 


*M.6. Let G be a discrete subgroup of M. Choose a point p in the plane whose stablilizer in G 
is trivial, and let S be the orbit of p. For every point g of S other than p, let £, be the line 
that is the perpendicular bisector of the line segment [p, gq], and let Hg be the half plane 
that is bounded by £, and that contains p. Prove that D = () Hg is a fundamental domain 
for G (see Exercise M.5). 


*M.7. Let G be a finite group operating on a finite set S. For each element g of G, let S& denote 
the subset of elements of S fixed by g : S8§ = {s € S| gs =s}, and let Gs be the stabilizer 
of s. 


(a) We may imagine a true—false table for the assertion that gs = s, say withrows indexed 
by elements of G and columns indexed by elements of S. Construct such a table for 
the action of the dihedral group D3 on the vertices of a triangle. 

(b) Prove the formula > .c5 |Gs| = Dee | 54]. 


(c) Prove Burnside’s Formula: |G| - (number of orbits) = > geG | S8]. 


M.8. There are 70 = () ways to color the edges of an octagon, with four black and four white. 
The group Dg operates on this set of 70, and the orbits represent equivalent colorings. Use 
Burnside’s Formula (see Exercise M.7) to count the number of equivalence classes. 


CHAPTER 7 


More Group Theory 


The more to do or to prove, the easier the doing or the proof. 


—James Joseph Sylvester 


We discuss three topics in this chapter: conjugation, the most important group operation, 
the Sylow Theorems, which describe subgroups of prime power order in a finite group, and 
generators and relations for a group. 


7.1 CAYLEY’S THEOREM 


Every group G operates on itself in several ways, left multiplication being one of them: 


GxGo G 


(7.1.1) Fie a4 


This is a transitive operation — there is just one orbit. The stabilizer of any element is the 
trivial subgroup <1 >, so the operation is faithful, and the permutation representation 


G —> Perm(G) 
he) g~>mg — left multiplication by g 
defined by this operation is injective (see Section 6.11). 


Theorem 7.1.3 Cayley’s Theorem. Every finite group is isomorphic to a subgroup of a 
permutation group. A group of order n is isomorphic to a subgroup of the symmetric 
group Sp. 


Proof. Since the operation by left multiplication is faithful, G is isomorphic to its image in 
Perm(G). If G has order n, Perm(G) is isomorphic to Sp. O 


Cayley’s Theorem is interesting, but it is difficult to use because the order of S,, is 
usually too large in comparison with n. 


7.2. THE CLASS EQUATION 
Conjugation, the operation of G on itself defined by 
(7.2.1) (g,x)~ exgl. 
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is more subtle and more important than left multiplication. Obviously, we shouldn't use 
multiplicative notation for this operation. We’ll verify the associative law (6.7.1) for the 
operation, using g«x as a temporary notation for the conjugate gxg™!: 


(gh) «x = (gh)x(gh)’ = ghxh"'g"! = g(hxx)g) = gx (h*x). 
Having checked the axiom, we return to the usual notation gxg™!. 


« The stabilizer of an element x of G for the operation of conjugation is called the centralizer 
of x. Itis often denoted by Z(x): 


(7.2.2) Z(x) = [ge G| exg bt =x} = [ge G| gx =xzg}. 
The centralizer of x is the set of elements that commute with x. 


¢ The orbit of x for conjugation is called the conjugacy class of x, and is often denoted by 
C(x). It consists of all of the conjugates gxg™!: 


(7.2.3) C(x) = {x'¢ G| x’ = gxg™! for some gin G}. 
The counting formula (6.9.2) tells us that 
(7.2.4) IG) = |Z(x)|-IC(@)| 
|G| = |centralizer|-|conj. class| 
The center Z of a group G was defined in Chapter 2. It is the set of elements that 
commute with every element of the group: Z = {z € G | zy = yz forall yin G}. 


Proposition 7.2.5 


(a) The centralizer Z(x) of an element x of G contains x, and it contains the center Z. 

(b) An element x of G is in the center if and only if its centralizer Z(x) is the whole group 
G, and this happens if and only if the conjugacy class C(x) consists of the element x 
alone. ma 


Since the conjugacy classes are orbits for a group operation, they partition the group. 
This fact gives us the class equation of a finite group: 


(7.2.6) IG| = Be |Cl. 

conjugacy 

classes C 
If we number the conjugacy classes, writing them as Cj, ..., Cx, the class equation reads | 
(7.2.7) IG) = [Cy] +--+ + 1Cgl. 


The conjugacy class of the identity element 1 consists of that element alone. It seems natural 
to list that class first, so that [C;| = 1. The other occurences of 1 on the right side of the class 
equation correspond to the elements of the center Z of G. Note also that each term on the 
right side divides the left side, because it is the order of an orbit. 


The numbers on the right side of the class equation divide the 


(7.2.8) order of the group, and at least one of them is equal to 1. 
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This is a strong restriction on the combinations of integers that may occur in such an 
equation. 


The symmetric group S3 has order 6. With our usual notation, the element x has order 
3. Its centralizer Z(x) contains x, so its order is 3 or 6. Since yx = x’y, x is not in the center 
of the group, and |Z(x)| = 3. It follows that Z(x) = <x>, and the counting formula (7.2.4) 
shows that the conjugacy class C(x) has order 2. Similar reasoning shows that the conjugacy 
class C(y) of the element y has order 3. The class equation of the symmetric group $3 is 


(7.2.9) 6=14+2+43. 


As we see, the counting formula helps to determine the class equation. One can 
determine the order of a conjugacy class directly, or one can compute the order of its 
centralizer. The centralizer, being a subgroup, has more structure, and computing its order is 
often the better way. We will see a case in which it is easier to determine the conjugacy classes 
in the next section, but let’s look at another case in which one should use the centralizer. 

Let G be the special linear group SL2(F3) of matrices of determinant 1 with entries 
in the field F3. The order of this group is 24 (see Exercise 4.4). To start computing the 
class equation by listing the elements of G would be incredibly boring. It is better to begin 
by computing the centralizers of a few matrices A. This is done by solving the equation 
PA = AP, for the matrix P. It is easier to use this equation, rather than PAP"! = A. For 


instance, let 
-1 a b 
anf, 2] a eo [? 2), 


The equation PA = AP imposes the conditions b = -c and a = d, and then the equation 
det P = 1 becomes a? + c” = 1. This equation has four solutions in F3: a = +1,c = 0 and 
a=0,c = +1. So |Z(A)| = 4 and |C(A)| = 6. This gives us a start for the class equation: 
24=1+6+---.To finish the computation, one needs to compute centralizers of a few more 
matrices. Since conjugate elements have the same characteristic polynomial, one can begin 
by choosing elements with different characteristic polynomials. 

The class equation of SL2(F3) is 


(7.2.10) 24=14+14+4444444+6. 


7.3 p-GROUPS 


The class equation has several applications to groups whose orders are positive powers of a 
prime p. They are called p-groups. 


Proposition 7.3.1 The center of a p-group is not the trivial group. 


Proof. Say that |G| = p® with e > 1. Every term on the right side of the class equation 
divides p°, so it is a power of p too, possibly p® = 1. The positive powers of p are divisible 
by p. If the class C, of the identity made the only contribution of 1 to the right side, the 
equation would read 


p® =1+ )\(muhiples of p). 


This is impossible, so there must be more 1’s on the right. The center is not trivial. 0 
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A similar argument can be used to prove the following theorem for operations of 
p-groups. We’ll leave its proof as an exercise. 


Theorem 7.3.2 Fixed Point Theorem. Let G be a p-group, and let S be a finite set on which 
G operates. If the order of S is not divisible by p, there is a fixed point for the operation of 
G on S—an element s whose stabilizer is the whole group. O 


Proposition 7.3.3 Every group of order p? is abelian. 


Proof. Let G be a group of order p*. According to the previous proposition, its center Z is 
not the trivial group. So the order of Z must be p or p”. If the order of Z is p”, then Z = G, 
and G is abelian as the proposition asserts. Suppose that the order of Z is p, and let x be an 
element of G that is not in Z. The centralizer Z(x) contains x as well as Z, so it is strictly 
larger than Z. Since |Z(x){ divides |G], it must be equal to p”, and therefore Z(x) = G. 
This means that x commutes with every element of G, so it is in the center after all, which is 
a contradiction. Therefore the center cannot have order p. O 


Corollary 7.3.4 A group of order p? is either cyclic, or the product of two cyclic groups of 
order p. 


Proof. Let G be a group of order p*. If G contains an element of order p?, it is cyclic. If 
not, every element of G different from 1 has order p. We choose elements x and y of order 
p such that y is not in the subgroup <x >. Proposition 2.11.4 shows that G is isomorphic to 
the product <x>X<y>. Oo 


The number of isomorphism classes of groups of order p® increases rapidly with e. 
There are five isomorphism classes of groups of order eight, 14 isomorphism classes of groups 
of order 16, and 51 isomorphism classes of groups of order 32. 


7.4 THE CLASS EQUATION OF THE ICOSAHEDRAL GROUP 


In this section we use the conjugacy classes in the icosahedral group J — the group of 
rotational symmetries of a dodecahedron, to study this interesting group. You may want to 
refer to a model of a dodecahedron or to an illustration while thinking about this. 

Let 6 = 27/3. The icosahedral group contains the rotation by 6 about a vertex vu. This 
rotation has spin (v, 8), so we denote it by iy,9). The 20 vertices form an /-orbit orbit, and 
if v' is another vertex, then P(y,9) and Pyy,g) are conjugate elements of J. This follows from 
Corollary 5.1.28(b). The vertices form an orbit of order 20, so all of the rotations ;,,g) are 
conjugate. They are distinct, because the only spin that defines the same rotation as (v, 0) is 
(-v, -@) and -6#98. So these rotations form a conjugacy class of order 20. 

Next, / contains rotations with angle 277 /5 about the center of a face, and the 12 faces 
form an orbit. Reasoning as above, we find a conjugacy class of order 12. Similarly, the 
rotations with angle 47r/5 form a conjugacy class of order 12. 

Finally, 7 contains a rotation with angle 2 about the center of an edge. There are 30 
edges, which gives us 30 spins (e, 27). But 2 = -z. If e is the center of an edge, so is -e, and 
the spins (e, 77) and (-e, -7r) represent the same rotation. This conjugacy class contains only 
15 distinct rotations. 
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The class equation of the icosahedral group is 
(7.4.1) 60=1+420+412+12+415. 


Note: Calling (v, @) and (e, 7) spins isn’t accurate, because v and e can’t both have unit 
length. But this is obviously not an important point. 


Simple Groups 


A group G is simple if it is not the trivial group and if it contains no proper normal 
subgroup — no normal subgroup other than <1 and G. (This use of the word simple does 
not mean “uncomplicated.” Its meaning here is roughly “not compound.”’) Cyclic groups of 
prime order contain no proper subgroup at all; they are therefore simple groups. All other 
groups except the trivial group contain proper subgroups, though not necessarily proper 
normal subgroups. 

The proof of the following lemma is straightforward. 


Lemma 7.4.2, Let N be a normal subgroup of a group G. 


(a) If NM contains an element x, then it contains the conjugacy class C(x) of x. 
(b) N is a union of conjugacy classes. 
(c) The order of N is the sum of the orders of the conjugacy classes that it contains. O 


We now use the class equation to prove the following theorem. 
Theorem 7.4.3 The icosahedral group J is a simple group. 


Proof. The order of a proper normal subgroup of the icosahedral group is a proper divisor 
of 60, and according to the lemma, it is also the sum of some of the terms on the right side of 
the class equation (7.4.1), including the term 1, which is the order of the conjugacy class of 
the identity element. There is no integer that satisfies both of those requirements, and this 
proves the theorem. O 


The property of being simple can be useful because one may run across normal 
subgroups, as the next theorem illustrates. 


Theorem 7.4.4 The icosahedral group is isomorphic to the alternating group As. Therefore 
As is a simple group. 


Proof. To describe this isomorphism, we need to find a set S of five elements on which / 
operates. This is rather subtle, but the five cubes that can be inscribed into a dodecahedron, 
one of which is shown below, form such a set. 

The icosahedral group operates on this set of five cubes, and this operation defines 
a homomorphism g: J — Ss, the associated permutation representation. We show that @ 
defines an isomorphism from / to the alternating group As. To do this, we use the fact that 
7 is asimple group, but the only information that we need about the operation is that it isn’t 
trivial. 
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(7.4.5) One of the Cubes Inscribed in a Dodecahedron. 


The kernel of g is a normal subgroup of /. Since / is a simple group, the kernel is 
either the trivial group <1> or the whole group /. If the kernel were the whole group, 
the operation of 7 on the set of five cubes would be the trivial operation, which it is not. 
Therefore kery = <1). This shows that g is injective. It defines an isomorphism from J te 
its image in Ss. 

Next, we compose the homomorphism ¢ with the sign homomorphism o: Ss > {+1}, 
obtaining a homomorphism og: / > {+ 1}. If this homomorphism were surjective, its kernel 
would be a proper normal subgroup of /. This is not the case because / is simple. Therefore 
the restriction is the trivial homomorphism, which means that the image of @ is contained 
in the kernel of a, the alternating group As. Both J and As both have order 60, and ¢ is 
injective. So the image of g, which is isomorphic to /, is As. O 


7.5 CONJUGATION IN THE SYMMETRIC GROUP 


The least confusing way to describe conjugation in the symmetric group is to think of 
relabeling the indices. If the given indices are 1, 2,3,4,5, and if we relabel them as 
a, b, ¢, d, e, respectively, the permutation p = (134) (25) is changed to (acd) (be). 

To write a formula for this procedure, we let g: J > L denote the relabeling map 
that goes from the set / of indices to the set L of letters: p(1) = a, g(2) = b, etc. Then the 
relabeled permutation is go po gy |. This is explained as follows: 


First map letters to indices using g"'. 
Next, permute the indices by p. 
Finally, map indices back to letters using ¢. 


We can use a permutation g of the indices to relabel in the same way. The result, the 
conjugate p’ = gpq_!, will be a new permutation of the same set of indices. For example, if 
we use g = (1452) to relabel, we will get 


gpg) = (1452) c (134) (25) 0(2541) = (435) (12) = p’. 


There are two things to notice. First, the relabeling will produce a permutation whose 
cycles have the same lengths as the original one. Second, by choosing the permutation q 
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suitably, we can obtain any other permutation that has cycles of those same lengths. If we 
write one permutation above the other, ordered so that the cycles correspond, we can use 
the result as a table to define g. For example, to obtain p’ = (435) (12) as a conjugate of 
the permutation p = (134) (25), as we did above, we could write 


(134) (25) 
(435) (12) 


The relabeling permutation q is obtained by reading this table down: 1 ~ 4, etc. 

Because a cycle can start from any of its indices, there will most often be several 
permutations g that yield the same conjugate. 

The next proposition sums up the discussion. 


Proposition 7.5.1 Two permutations p and p’ are conjugate elements of the symmetric 
group if and only if their cycle decompositions have the same orders. O 


We use Proposition 7.5.1 to determine the class equation of the symmetric group Sq. 
The cycle decomposition of a permutation gives us a partition of the set {1, 2, 3, 4}. The 
orders of the subsets making a partition of four can be 


1,1,1,1; 2,1,1; 2,2; 3,1; or 4. 


The permutations with cycles of these orders are the identity, the transpositions, the products 
of (disjoint) transpositions, the 3-cycles, and the 4-cycles, respectively. 

There are six transpositions, three products of transpositions, eight 3-cycles, and six 
4-cycles. The proposition tells us that each of these sets forms one conjugacy class, so the 
class equation of Sq is 


(7.5.2) 24=14+34+6+6+8. 
A similar computation shows that the class equation of the symmetric group Ss is 
(7.5.3) 120 = 1+104+ 15+ 20+ 20+ 30 + 24. 


We saw in the previous section (7.4.4) that the alternating group As is a simple group 
because it is isomorphic to the icosahedral group /, which is simple. We now prove that most 
alternating groups are simple. 


Theorem 7.5.4 For every n > 5, the alternating group A, is a simple group. 


To complete the picture we note that Ag is the trivial group, A3 is cyclic of order three, and 
that A, is not simple. The group of order four that consists of the identity and the three 
products of transpositions (12)(34), (13)(24), (14)(23) is a normal subgroup of S4 and 
of Ag (see (2.5.13)(b)). 


Lemma 7.5.5 


(a) Forn > 3, the alternating group A, is generated by 3-cycles. 
(b) Forn > 5, the 3-cycles form a single conjugacy class in the alternating group An. 
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Proof. (a) This is analogous to the method of row reduction. Say that an even permutation 
P, not the identity, fixes m of the indices. We show that if we multiply p on the left by a 
suitable 3-cycle g, the product gp will fix at least m + 1 indices. Induction on m will complete 
the proof. 

If p is not the identity, it will contain either a k-cycle with k > 3, or a product 
of two 2-cycles. It does not matter how we number the indices, so we may suppose that 
p= (123.-k)--- or p= (12)(34)---. Let g = (82D. The product gp fixes the index 1 as 
well as all indices fixed by p. 


(b) Suppose that n > 5, and let g = (123). According to Proposition 7.5.1, the 3-cycles 
are conjugate in the symmetric group S,. So if g’ is another 3-cycle, there is a permutation 
p such that pgp"! = q’. If p is an even permutation, then g and q’ are conjugate in Ay. 
Suppose that p is odd. The transposition t = (45) isin S, because n > 5, and tgrt! = q. 
Then pris even, and (pt)q(pt) ! = q’. | 


Proof. We now proceed to the proof of the Theorem. Let N be a nontrivial normal subgroup 
of the alternating group A, with n > 5. We must show that N is the whole group Ay. It 
suffices to show that N contains a 3-cycle. If so, then (7.5.5)(b) will show that N contains 
every three-cycle, and (7.5.5)(a) will show that N = Ap. 

Weare given that N is anormal subgroup and that it contains a permutation x different 
from the identity. Three operations are allowed: We may multiply, invert, and conjugate. 
For example, if g is any element of A,, then gxg™! and x"! are in N too. So is their product, 
the commutator gxg™!x7!. And since g can be arbitrary, these commutators give us many 
elements that must be in N. 

Our first step is to note that a suitable power of x will have prime order, say order 
£. We may replace x by this power, so we may assume that x has order @. Then the cycle 
decomposition of x will consist of 2-cycles and 1-cycles. 

Unfortunately, the rest of the proof requires looking separately at several cases. In each 
of the cases, we compute a commutator gxg !x7!, hoping to be led to a 3-cycle. Appropriate 
elements can be found by experiment. 


Case I: x has order @ > 5. 

How the indices are numbered is irrelevant, so we may suppose that x contains the £-cycle 
(12345 ..£), say x = (12345 --£)y, where y is a permutation of the remaining indices. Let 
g = (432). Then 


first do this 
exg ix) = [(432)]o[(12345 --£)y] 0[(234)]o[y '(é--54321)] = (245). 


The commutator is a 3-cycle. 


Case 2: x has order 3. 


There is nothing to prove if x is a 3-cycle. If not, then x contains at least two 3-cycles, say 
x = (123)(456)y. Let g = (432). Then gxg !x7! = (15243). The commutator has order 
5. We go back to Case 1. 


Case 3a: x has order 2 and it contains a 1-cycle. 
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Since it is an even permutation, x must contain at least two 2-cycles, say x = (12)(3.4)(5)y. 
Let g = (531). Then gxg!x7! = (15243). The commutator has order 5, and we go back 
to Case 1 again. , 


Case 3b: x has order £ = 2, and contains no 1-cycles. 
Since n > 5, x contains more than two 2-cycles. Say x = (12)(3 4)(56)y. Let g = (531). 
Then gxg™!x7! = (153)(246). The commutator has order 3 and we go back to Case 2. 


These are the possibilities for an even permutation of prime order, so the proof of the 
theorem is complete. Oo 


7.6 NORMALIZERS 


We consider the orbit of a subgroup H of a group G for the operation of conjugation by G. 
The orbit of [H] is the set of conjugate subgroups [gHg™'], with g in G. The stabilizer of 
[1] for this operation is called the normalizer of H, and is denoted by NCA): 


(7.6.1) N(A) = {ge G| gHg' = H}. 

The Counting Formula reads 

(7.6.2) |G| = |N(A)| - (number of conjugate subgroups). 
The number of conjugate subgroups is equal to the index [G: N(H)]. 


Proposition 7.6.3 Let H bea subgroup ofa group G, and let N be the normalizer of H. 


(a) His anormal subgroup of N. 
(b) H is anormal subgroup of G if and only if N = G. 
(c) || divides |N| and |N| divides |G]. O 


For example, let H be the cyclic subgroup of order two of the symmetric group Ss that 
is generated by the element p = (12)(34). The conjugacy class C(p) contains the 15 pairs 
of disjoint transpositions, each of which generates a conjugate subgroup of H. The counting 
formula shows that the normalizer N( 4) has order eight: 120 = 8-15. 


7.7 THE SYLOW THEOREMS 


The Sylow Theorems describe the subgroups of prime power order of an arbitrary finite 
group. They are named after the Norwegian mathematician Ludwig Sylow, who discovered 
them in the 19th century. 

Let G be a group of order n, and let p be a prime integer that divides n. Let p® denote 
the largest power of p that divides n, so that 


(7.7.1) n= p’m, 


where m is an integer not divisible by p. Subgroups H of G of order p® are called Sylow 
p-subgroups of G. A Sylow p-subgroup is a p-group whose index in the group isn’t divisible 


by p. 
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Theorem 7.7.2 First Sylow Theorem. A finite group whose order is divisible by a prime Pp 
contains a Sylow p-subgroup. 


Proofs of the Sylow Theorems are at the end of the section. 


Corollary 7.7.3 A finite group whose order is divisible by a prime p contains an element of 
order p. 


Proof. Let G be such a group, and let H be a Sylow p-subgroup of G. Then A contains an 
element x different from 1. The order of x divides the order of H, so it is a positive power 


of p, say p*. Then xP" has order p. Oo 


This corollary isn’t obvious. We already know that the order of any element divides the 
order of the group, but we might imagine a group of order 6, for example, made up of the 
identity ] and five elements of order 2. No such group exists. A group of order 6 must contain 
an element of order 3 and an element of order 2. 


The remaining Sylow Theorems give additional information about the Sylow sub- 
groups. 


Theorem 7.7.4 Second Sylow Theorem. Let G be a finite group whose order is divisible by 
a prime p. 

(a) The Sylow p-subgroups of G are conjugate subgroups. 

(b) Every subgroup of G that is a p-group is contained in a Sylow p-subgroup. 


A conjugate subgroup of a Sylow p-subgroup will be a Sylow p-subgroup too. 


Corollary 7.7.5 A group G has just one Sylow p-subgroup A if and only if that subgroup is 
normal. Oo 


Theorem 7.7.6 Third Sylow Theorem. Let G be a finite group whose order n is divisible 
by a prime p. Say that nm = p°m, where p does not divide m, and let s denote the number 
of Sylow p-subgroups. Then s divides m and s is congruent to 1 modulo p: s = kp +1 for 
some integer k > 0. 


Before proving the Sylow theorems, we will use them to classify groups of orders 6, 15, 
and 21. These examples show the power of the theorems, but the classification of groups of 
order n is not easy when n has many factors. There are just too many possibilities. 


Proposition 7.7.7 


(a) Every group of order 15 is cyclic. 

(b) There are two isomorphism classes of groups of order 6, the class of the cyclic group C6 
and the class of the symmetric group $3. 

(c) There are two isomorphism classes of groups of order 21: the class of the cyclic group 
C2, and the class of a group G generated by two elements x and y that satisfy the 
relations x’ = 1, y3 =l,yx=x’y. 
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Proof. (a) Let G bea group of order 15. According to the Third Sylow Theorem, the number 
of its Sylow 3-subgroups divides 5 and is congruent 1 modulo 3. The only such integer is 1. 
Therefore there is one Sylow 3-subgroup, say H, and it is a normal subgroup. For similar 
reasons, there is just one Sylow 5-subgroup, say K, and it is normal. The subgroup A is cyclic 
of order 3, and K is cyclic of order 5. The intersection HN K is the trivial group. Proposition 
2.11.4(d) tells us that G is isomorphic to the product group H x K. So all groups of order 15 
are isomorphic.to the product C3 X Cs of cyclic groups and to each other. The cyclic group 
Cj5 is one such group, so all groups of order 15 are cyclic. 


(b) Let G be a group of order 6. The First Sylow Theorem tells us that G contains a Sylow 
3-subgroup H, a cyclic group of order 3, and a Sylow 2-subgroup K, cyclic of order 2. 
The Third Sylow Theorem tells us that the number of Sylow 3-subgroups divides 2 and is 
congruent 1 modulo 3. The only such integer is 1. So there is one Sylow 3-subgroup H, 
and it.is a normal subgroup. The same theorem also tells us that the number of Sylow 
two-subgroups divides 3 and is congruent 1 modulo 2. That number is either 1 or 3. 


Case 1: Both H and K are normal subgroups. 


As in the previous example, G is isomorphic to the product group H x K, which is 
abelian. All abelian groups of order 6 are cyclic. 


Case 2: G contains 3 Sylow 2-subgroups, say K;, K2, K3. 

The group G operates by conjugation on the set S = {[K1],[K2], [K3]} of order 
three, and this gives us a homomorphism g:G — S3 from G to the symmetric group, the 
associated permutation representation (6.11.2). The Second Sylow Theorem tells us that 
the operation on S is transitive, so the stabilizer in G of the element [K;], which is the 
normalizer N(K;), has order 2. It is equal to K;. Since K, % K2 = {1}, the identity is the 
only element of G that fixes all elements of S. The operation is faithful, and the permutation 
representation ¢ is injective. Since G and S3 have the same order, ¢ is an isomorphism. 


(c) Let G be a group of order 21. The Third Sylow Theorem shows that the Sylow 7-subgroup 
K must be normal, and that the number of Sylow 3-subgroups is 1 or 7. Let x be a generator 
for K, and let y be a generator for one of the Sylow 3-subgroups H. Then x’ = 1 and y® = 1, 
so HK = {1}, and therefore the product map H xX K —> G is injective (2.11.4)(a). Since 
G has order 21, the product map is bijective. The elements of G are the products x! y/ with 
0<i<7and0<j <3. 

Since K is anormal subgroup, yxy"! is an element of K, a power of x, say x', with i in 
the range 1 <i < 7. So the elements x and y satisfy the relations 


(7.7.8) x’=1, p=1, yx=a'y. 


These relations are enough to determine the multiplication table for the group. However, 
the relation y’ = 1 restricts the possible exponents i, because it implies that yxy? = x: 


x=yxy3 = yxiy2a yx? 7} sgh 


Therefore i? =1 modulo 7. This tells us that i must be 1, 2, or 4. 
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The exponent i = 3, for instance, would imply x = x? = x6 = x!. Then x? = 1 and 
also x’ = 1, from which it follows that x = 1. The group defined by the relations (7.7.8) with 
i = 3 is a cyclic group of order 3, generated by y. 


Case I: yxy~! = x. Then x commutes with y. Both H and K are normal subgroups. As 
before, G is isomorphic to a direct product of cyclic groups of orders 3 and 7, and is a cyclic 


group. 


Case 2: yxy"! = x*. As noted above, the multiplication table is determined. But we still 
have to show that this group actually exists. This comes down to showing that the relations 
don’t cause the group to collapse, as happens when i = 3. We'll learn a systematic method 
for doing this, the Todd-Coxeter Algorithm, in Section 7.11. Another way is to exhibit the 
group explicitly, for example as a group of matrices. Some experimentation is required to do 
this. 

Since the group we are looking for is supposed to contain an element of order 7, it is 
natural to try to find suitable matrices with entries modulo 7. At least we can write down a 
2X2 matrix with entries in F7 that has order 7, namely the matrix x below. Then y can be 
found by trial and error. The matrices 


[i] =P 


with entries in F7 satisfy the relations x’ = 1, y> =1, yx = x’ y, and they generate a group 
of order 21. 


Case 3: yxy"! = x‘. Then y*xy"? = x”. We note tha} y’ is also an element of order 3. So we 
may replace y by y’, which is another generator for H. The result is that the exponent 4 is 
replaced by 2, which puts us back in the previous case. 


Thus there are two isomorphism classes of groups of order 21, as claimed. O 


We use two lemmas in the proof of the first Sylow Theorem. 


Lemma 7.7.9 Let U be a subset of a group G. The order of the stabilizer Stab((U]) of [U] 
for the operation of left multiplication by G on the set of its subsets divides both of the 
orders |U| and |G|. 


Proof. If H is a subgroup of G, the H-orbit of an element u of G for left multiplication by 
H is the right coset Hu. Let H be the stabilizer of [U]. Then multiplication by H permutes 
the elements of U, so U is partitioned into H-orbits, which are right cosets. Each coset has 
order ||, so || divides |U|. Because H is a subgroup, | | divides |G|. oO 


Lemma 7.7.10 Let n be an integer of the form p°m, where e > 0 and p does not divide m. 
The number N of subsets of order p® in a set of order n is not divisible by p. 


Proof. The number N is the binomial coefficient 


(7)- n (n—1)::- N—k)---(n— p® +1) 


Po) pe(p® —1)-+-(pe-k) ss 1 
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The reason that VN# 0 modulo p is that every time p divides a term (7 —k) in the numerator 
of N, it also divides the term (p® — k) of the denominator the same number of times: 
If we write k in the form k = p'/£, where p does not divide £, then i < e. Therefore 
(m — k) = (p® —k) and (n — k) = (p°m — k) are both divisible by p! but not by p't!. O 


Proof of the First Sylow Theorem. Let S be the set of all subsets of G of order p®. One of 
the subsets is a Sylow subgroup, but instead of finding it directly we look at the operation of 
left multiplication by G on S. We will show that one of the subsets [U] of order p® has a 
stabilizer of order p*. That stabilizer will be the subgroup we are looking for. 


We decompose S into orbits for the operation of left multiplication, obtaining an 
equation of the form 


N=|S|\= S> |O}. 
orbits O 


According to Lemma 7.7.10, p doesn’t divide N. So at least one orbit has an order that isn’t 
divisible by p, say the orbit Ojy of the subset [U]. Let H be the stabilizer of [U]. Lemma 
7.7.9 tells us that the order of H divides the order of U, which is p®. So |A| is a power of p. 
We have || -|Ojyj}| = |G| = p®m, and |O[u)| isn’t divisible by p. Therefore |Oju)| =m 
and || = p*. So H is a Sylow p-subgroup. Oo 


Proof of the Second Sylow Theorem. Suppose that we are given a p-subgroup K and a 
Sylow p-subgroup H. We will show that some conjugate subgroup Hf’ of H contains K, 
which will prove (b). If K is also a Sylow p-subgroup, it will be equal to the conjugate 
subgroup H’, so (a) will be proved as well. 

We choose a set C on which the group G operates, with these properties: p does not 
divide the order |C|, the operation is transitive, and C contains an element c whose stabilizer 
is H. The set of left cosets of H in G has these properties, so such a set exists. (We prefer 
not to clutter up the notation by explicit reference to cosets.) 

Werestrict the operation of G onC tothe p-group K. Since p doesn’t divide |C|, there 
is a fixed point c’ for the operation of K. This is the Fixed Point Theorem 7.3.2. Since the 
operation of G is transitive, c’ = gc for some g in G. The stabilizer of c’ is the conjugate 
subgroup gHg"! of H (6.7.7), and since K fixes c’, the stabilizer contains K. Oo 


Proof of the Third Sylow Theorem. We write |G| = p©m as before. Let s denote the number 
of Sylow p-subgroups. The Second Sylow Theorem tells us that the operation of G on the 
set S of Sylow p-subgroups is transitive. The stabilizer of a particular Sylow p-subgroup [#7] 
is the normalizer N = N(H) of H. The counting formula tells us that the order of S, which 
is s, is equal to the index [G: N]. Since N contains H (7.6.3) and since [G: H] is equal to m, 
S divides m. 

Next, we decompose the set S into orbits for the operation of conjugation by H. The 
H-orbit of [H] has order 1. Since H is a p-group, the order of any H-orbit is a power of p. 
To show that s=1 modulo p, we show that no element of S except [1] is fixed by H. 

Suppose that 7’ is a p-Sylow subgroup and that conjugation by H fixes [#7]. Then 7 
is contained in the normalizer N’ of H’,so both H and H’ are Sylow p-subgroups of N’. The 
second Sylow theorem tells us that the p-Sylow subgroups of N’ are conjugate subgroups of 
N’'. But H’ is anormal subgroup of N’ (7.6.3)(a). Therefore H’ = H. 0 
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7.8 GROUPS OF ORDER 12 


We use the Sylow Theorems to classify groups of order 12. This theorem serves to illustrate 
the fact that classifying groups becomes complicated when the order has several factors. 


Theorem 7.8.1 There are five isomorphism classes of groups of order 12. They are 
represented by: 


e the product of cyclic groups C4 X C3, 

e the product of cyclic groups C2 X C2 x C3, 

e the alternating group Ag, 

e the dihedral group Dg, 

* the group generated by elements x and y, with relations x* = 1, yp=Hlxy=yrx. 


All but the last of these groups should be familiar. The product group C4 X C3 is isomorphic 
to Cy, and C2 X C2 X C3 is isomorphic to C2 X C¢ (see Proposition 2.11.3). 


Proof. Let G be a group of order 12, let H be a Sylow 2-subgroup of G, which has order 4, 
and let K be a Sylow 3-subgroup of order 3. It follows from the Third Sylow Theorem that 
the number of Sylow 2-subgroups is either 1 or 3, and that the number of Sylow 3-subgroups 
is 1 or 4. Also, H is a group of order 4 and is therefore either a cyclic group C4 or the Klein 
four group C2 X C2 (Proposition 2.11.5). Of course K is cyclic. 

Though this is not necessary for the proof, begin by showing that at least one of the 
two subgroups, H or K, is normal. If K is not normal, there will be four Sylow 3-subgroups 
conjugate to K, say Kj,..., K4, with K; = K. These groups have prime order, so the 
intersection of any two of them is the trivial group <1>. Then there are only three elements 
of G that are not in any of the groups Kj. This fact is shown schematically below. 


A Sylow 2-subgroup H has order 4, and HM K; =<1>. Therefore H consists of the three 
elements not in any of the groups K;, together with 1. This describes H for us and shows 
that there is only one Sylow 2-subgroup. Thus H is normal. 

Next, we note that HM K =<1),so the product map H X K —> G isa bijective map 
of sets (2.11.4). Every element of G has a unique expression as a product hk, with h in H 
and k in K. 
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Case 1: H and K are both normal. 


Then G is isomorphic to the product group H X K (2.11.4). Since there are two 
possibilities for H and one for K, there are two possibilities for G: 


GreCyxC3 or GeCyX Cy XC3. 
These are the abelian groups of order 12. 


Case 2: K isnot normal. 


There are four conjugate Sylow 3-subgroups, K;,..., K4, and G operates by. con- 
jugation on this set of four. This operation determines a permutation representation, a 
homomorphism g: G— S4 to the symmetric group. We’ll show that g maps G isomorphi- 
cally to the alternating group Aq. 

The normalizer N; of K; contains K;, and the counting formula shows that |N;| = 3. 
Therefore N; = Kj. Since the only element in common to the subgroups K; is the identity, 
only the identity stabilizes all of these subgroups. Thus the operation of G is faithful, ¢ is 
injective, and G is isomorphic to its image in S4. 

Since G has four subgroups of order 3, it contains eight elements of order 3. Their 
images are the 3-cycles in S4, which generate A4 (7.5.5). So the image of G contains Aq. 
Since G and Aq have the same order, the image is equal to Aq. 


Case 3: K is normal, but H is not. 
Then H operates by conjugation on K = {1, y, y*}. Since H is not normal, it contains 
an element x that doesn’t commute with y, and then xyx7! = y?. 


Case 3a: K is normal, H is not normal, and H is a cyclic group. 
The element x generates H, so G is generated by elements x and y, with the relations 


(7.8.2) xt=1,y=1xy=y’x. 


These relations determine the multiplication table of G, so there is at most one isomorphism 
class of such groups. But we must show that these relations don’t collapse the group further, 
and as with groups of order 21 (see 7.7.8), it is simplest to represent the group by matrices. 
We'll use complex matrices here. Let w be the complex cube root of unity e?”'/3. The 
complex matrices 


(7.8.3) x=|, | y=|° | 


satisfy the three relations, and they generate a group of order 12. 


Case 3b: K is normal, H is not normal, and H C2 X Co. 


The stabilizer of y for the operation of H by conjugation on the set {y, y”} has order 2. 
So H contains an element z #1 such that zy = yz andalso an element x such that xy = y*x. 
Since H is abelian, xz = zx. Then G is generated by three elements x, y. z, with relations 


Pel yo, 2=H1, yor, x2S zx, sya yx: 
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These relations determine the multiplication table of the group, so there is at most one 
isomorphism class of such groups. The dihedral group Dg isn’t one of the four groups 
described before, so it must be this one. Therefore G is isomorphic to Dg. Oo 


7.9 THE FREE GROUP 


We have seen that one can compute in the symmetric group S3 using the usual generators x 
and y, together with the relations x = 1, y° = 1, and yx = x’y. In the rest of the chapter, 
we study generators and relations in other groups. 

We first consider groups with generators that satisfy no relations other than ones (such 
as the associative law) that are implied by the group axioms. A set of group elements that 
satisfy no relations except those implied by the axioms is called free, and a group that has a 
free set of generators is called a free group. 

To describe free groups, we start with an arbitrary set, say S = {a, b, c, ...}. We call its 
elements “symbols,” and we define a word to be a finite string of symbols, in which repetition 
is allowed. For instance a, aa, ba, and aaba are words. Two words can be composed by 
juxtaposition, that is, placing them side by side: 


aa, ba~»aaba. 


This is an associative law of composition on the set W of words. We include the ‘‘empty 
word” in W as an identity element, and we use the symbol 1 to denote it. Then the set 
W becomes what is called the free semigroup on the set S. It isn’t a group because it lacks 
inverses, and adding inverses complicates things a little. 

Let S’ be the set that consists of symbols a and a7! for every ain S: 


(7.9.1) ead | tet re eee aaa 


and let W’ be the semigroup of words made using the symbols in S’. If a word looks like 


es re or weer ly... 


for some x in S, we may agree to cancel the two symbols x and x! to reduce the length of 
the word. A word is called reduced if no such cancellation can be made. Starting with any 
word w in W’, we can perform a finite sequence of cancellations and must eventually get a 
reduced word wo, possibly the empty word 1. We call wo a reduced form of w. 

There may be more than one way to proceed with cancellation. For instance, starting 
with w = abb™'c"!cb, we can proceed in two ways: 


apf eb abi Igp 
| J 


a¢\¢b abp'p 
L 1 
ab ab 


The same reduced word is obtained at the end, though the symbols come from different 
places in the original word. (The ones that remain at the end have been underlined.) This is 


always true. 


Proposition 7.9.2 There is only one reduced form of a given word w. 
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Proof. We use induction on the length of w. If w is reduced, there is nothing to show. If not, 
there must be some pair of symbols that can be cancelled, say the underlined pair 


w= wel... 


(Let’s allow x to denote any element of S’, with the understanding that if x = a7! then 
x7! =a.) If we show that we can obtain every reduced form of w by cancelling the pair xx! 
first, the proposition will follow by induction, because the word -- eS a --+ is shorter. 

Let wy be a reduced form of w. It is obtained from w by some sequence of cancellations. 
The first case is that our pair xx"! is cancelled at some step in this sequence. If so, we may 
as well cancel xx"! first. So this case is settled. On the other hand, since wo is reduced, the 
pair xx! cannot remain in wo. At least one of the two symbols must be cancelled at some 
time. If the pair itself is not cancelled, the first cancellation involving the pair must look like 


wef Yio. or coat lyfe. 


Notice that the word obtained by this cancellation is the same as the one obtained by 
cancelling the pair xx7!. So at this stage we may cancel the original pair instead. Then we 
are back in the first case, so the proposition is proved. 0 


We call two words w and w’ in W’ equivalent, and we write w ~ w’, if they have the 
same reduced form. This is an equivalence relation. 


Proposition 7.9.3. Products of equivalent words are equivalent: If w~ w’ and v~ v’, then 
wo~ wd’, 


Proof. To obtain the reduced word equivalent to the product wv, we may first cancel as 
much as possible in w and in v, to reduce w to wo and v to up. Then wv is reduced to wovo. 
Now we continue, cancelling in wovo until the word is reduced. If w~ w’ and v~ v’, the 
same process, when applied to w’v’, passes through wyup too, so it leads to the same re- 
duced word. O 


It follows from this proposition that equivalence classes of words can be multiplied: 


Proposition 7.9.4 The set F of equivalence classes of words in W’ is a group, with the law 
of composition induced from multiplication (juxtaposition) in W’. 


Proof. The facts that multiplication is associative and that the class of the empty word 1 is 
an identity follow from the corresponding facts in W’ (see Lemma 2.12.8). We must check 
that all elements of F are invertible. But clearly, if w is the product xy---z of elements of 
S’, then the class of 27! ..- yLx7! inverts the class of w. Oo 


The group F of equivalence classes of words in S’ is called the free group on the set 
S. An element of ¥ corresponds to exactly one reduced word in W’. To multiply reduced 
words, combine and cancel: (abc!) (cb) ~» abc"!cb = abb. 

Power notation may be used: aaab™'b7! = ab”. 


Note: The free group on a set S = {a} of one element is simply an infinite cyclic group. In 
contrast, the free group on a set of two or more elements is quite complicated. 
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7.10 GENERATORS AND RELATIONS 


Having described free groups, we now consider the more common case, that a set of 
generators of a group is not free — that there are some nontrivial relations among them. 


Definition 7.10.1 A relation R among elements x1, ..., Xp, of a group G is a word r in the 
free group on the set {x,,...,X,} that evaluates to 1 in G. We will write such a relation 
either as r, or for emphasis, as r = 1. 


For example, the dihedral group D, of symmetries of a regular n-sided polygon is 
generated by the rotation x with angle 277/n and a reflection y, and these generators satisfy 
relations that were listed in (6.4.3): 


(7.10.2) x*=1,y=1,xyxy=1. 


(The last relation is often written as yx = x7! y, but it is best to write every relation in the 
form r = 1 here.) 

One can use these relations to write the elements of D,, in the form x! ys withO<i<n 
and 0 < j <2, and then one can compute the multiplication table for the group. So 
the relations determine the group. They are therefore called defining relations. When the 
relations are more complicated, it can be difficult to determine the elements of the group 
and the multiplication table explicitly, but, using the free group and the next lemma, we 
will define the concept of a group generated by a given set of elements, with a given set of 
relations. 


Lemma 7.10.3 Let R be a subset of a group G. There exists a unique smallest normal 
subgroup N of G that contains R, called the normal subgroup generated by R. If a normal 
subgroup of G contains R, it contains N. The elements of N can be described in either of 
the following ways: 


(a) An element of G is in N if it can be obtained from the elements of R using a finite 
sequence of the operations of multiplication, inversion, and conjugation. 

(b) Let R’ be the set consisting of elements r and r! with in R. An element of G is in N 
if it can be written as a product y;--- y, of some arbitrary length, where each yy is a 
conjugate of an element of R’. 


Proof. Let N denote the set of elements obtained by a sequence of the operations mentioned 
in (a). A nonempty subset is a normal subgroup if and only if it is closed under those 
operations. Since N is closed under those operations, it is a normal subgroup. Moreover, 
any normal subgroup that contains R must contain N. So the smallest normal subgroup 
containing R exists, and is equal to N. Similar reasoning identifies NV as the subset described 
in (b). Oo 


As usual, we must take care of the empty set. We say that the empty set generates the trivial 
subgroup {1}. 


Definition 7.10.4 Let F be the free group onaset S = (x1,..., Xn},andlet R = (r1,..., rx} 
be a set of elements of F. The group generated by S, with relations r; =1, ..., r, = 1, is 
the quotient group G = F/R, where F is the normal subgroup of F generated by R. 
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The group G will often be denoted by 

(7.10.5) €X1,--+,XnIT1,---, 1k 
Thus the dihedral group Dy, is isomorphic to the group 
(7.10.6) CX Vie yD. 


Example7.10.7 In the tetrahedral group T of rotational symmetries of a regular tetrahedron, 
let x and y denote rotations by 27/3 about the center of a face and about a vertex, and let z 
denote rotation by z about the center of an edge, as shown below. With vertices numbered 
as in the figure, x acts on the vertices as the permutation (234), y acts as (123), and z acts 
as (13)(24). Computing the product of these permutations shows that xyz acts trivially on 
the vertices. Since the only isometry that fixes all vertices is the identity, xyz = 1. 


(7.10.8) 3 l 


So the following relations hold in the tetrahedral group: 


(7.10.9) e=1, ~=l, 2=1, xyz=l. oO 
Two questions arise: 


1. Is this a set of defining relations for T? In other words, is the group 
(7.10.10) ore At ae ahaa 7), 


isomorphic to 7? 

It is easy to verify that the rotations x, y, z generate 7, but it isn’t particularly easy 
to work with the relations. It is confusing enough to list the 12 elements of the group 
as products of the generators without repetition. We show in the next section that the 
answer to our question is yes, but we don’t do that by writing the elements of the group 
.explicitly. 


2. How can one compute in a group G = ¢X1,...,Xn|11,-..,1%> that is presented by 
generators and relations? 

Because computation in the free group F is easy, the only problem is to decide when an 
element w of the free group represents the identity element of G, i.e., when w is an element 
of the subgroup R. This is the word problem for G. If we can solve the word problem, then 
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because the relation w; = w2 is equivalent to w,!w2 = 1, we will be able to decide when 


two elements of the free group represent equal elements of G. This will enable us to compute. 
The word problem can be solved in any finite group, but not in every group. However, 
we won’t discuss this point, because some work is required to give a precise meaning to 
the statement that the word problem can or cannot be solved. If you are interested, see 
[Stillwell]. 
The next example shows that computation in R can become complicated, even in a 
relatively simple case. 


Example 7.10.11 The element w = yxyx is equal to 1 in the group 7. Let’s verify that w 
is in the normal subgroup R generated by the four relations (7.10.9). We use what you will 
recognize as a standard method: reducing w to the identity by the allowed operations. 

The relations that we will use are z? and xyz, and we’ll denote them by p and q, 
respectively. First, let w; = y !wy = xyxy. Because R is a normal subgroup, w, is in R if 
and only if w is. Next, let w. = g 'w, =z7! xy. Since q is in R, w> is in R if and only if w 
is. Continuing, w3 = zunz i= xyz!, wa=qlw3=z7'z!, pwa = 1. Solving back, 
w = yqz!gp ‘zy! isin R. Thus w = 1 in the group (7.10.10). 0 


We return to the group G defined by generators and relations. As with any quotient 
group, we have a canonical homomorphism 


m:F — F/R=G 


that sends a word w to the coset w = [wf], and the kernel of z is R (2.12.2). To keep 
track of the group in which we are working, it might seem safer to denote the images in G of 
elements of F by putting bars over the letters. However, this isn’t customary. When working 
in G, one simply remembers that elements w, and w of the free group are equal in G if the 
cosets w,R and w27F are equal, or if w | w2 is in R. 

Since the defining relations 7; are in R, r; = 1 is true in G. If we write r; out as words, 
then because z is a homomorphism, the corresponding product in G will be equal to 1 (see 
Corollary 2.12.3). For instance, xyz = 1 is true in the group<x, y, z|x>, y>, 27, xyz>. 


We go back once more to the example of the tetrahedral group and to the first question. 
How is the group<x, y, z|x°, y’, 22, xyz> related to T? A partial explanation is based on 
the mapping properties of free groups and of quotient groups. Both of these properties are 
intuitive. Their proofs are simple enough that we leave them as exercises. 


Proposition 7.10.12 Mapping Property of the Free Group. Let ¥ be the free group on a set 
S = {a,b,...}, and let G be a group. Any map of sets f:S — G extends in a unique way 
to a group homomorphism g: F — G. If we denote the image f(x) of an element x of S 
by x, then g sends a word in S’ = {a, a',b,b",.. .} to the corresponding product of the 


elements {a,a7',b,b}...} inG. oO 


This property reflects the fact that the elements of S satisfy no relations in ¥ except those 
implied by the group axioms. It is the reason for the adjective “‘free.”’ 


Proposition 7.10.13 Mapping Property of Quotient Groups. Let g@: G’ > G be a group 
homomorphism with kernel K, and let N be a normal subgroup of G’ that is containedin K. 
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Let G =G /N, and let 7: G’ + G be the canonical map a ~»G. The rule G@) = g(a) 


defines a homomorphism @: ome G,and Gor =. 


GC 0 

This mapping property generalizes the First ISomorphism Theorem. The hypothesis that NV 
be contained in the kernel K is, of course, essential. 

The next corollary uses notation introduced previously: S = {x1,..., Xn} is a subset 

of a group G, R = {r1,...,7,} is a set of relations among the elements of S of G, F 


is the free group on S, and R is the normal subgroup of F generated by R. Finally, 
G=<x1,..., nll... ko = F/R. 


Corollary 7.10.14 

(i) There is a canonical homomorphism y:G — G that sends x; ~» xj. 

(ii) y is surjective if and only if the set S generates G. 
(iii) yy is injective if and only if every relation among the elements of Sis in R. 
Proof. We will prove (i), and omit the verification of (ii) and (iii). The mapping property of 
the free group gives us a homomorphism gy: F > G with y(x;) = xj. Since the relations 
r; evaluate to 1 in G, R is contained in the kernel K of yg. Since the kernel is a normal 


subgroup, 7 is also contained in K. Then the mapping property of quotient groups gives us 
a map @:G —> G. This is the map wv: 


G Oo 


If the map w described in the corollary is bijective, one says that R forms a complete 
set of relations among the generators S. To decide whether this is true requires knowing 
more about G. Going back to the tetrahedral group, the corollary gives us a homomorphism 
W:G — T, where G =<x, y, Z | x3, y, z, xyz>. It is surjective because x, y, z generate T. 
And we saw in Example 7.10.11 that the relation yx yx, which holds among the elements 
of 7, is in the normal subgroup F generated by the set {x3, y?, z?, xyz}. Is every relation 
among x, y, z in R? If not, we’d want to add some more relations to our list. It may seem 
disappointing not to have the answer to this question yet, but we will see in the next section 
that y is indeed bijective. 


Recapitulating, when we speak of a group defined by generators S and relations R, we 
mean the quotient group G = F/R, where F is the free group on S and RF is the normal 
subgroup of F generated by R. Any set of relations will define a group. The larger R is, the 
larger R becomes, and the more collapsing takes place in the homomorphism 7: F > G. 
The extreme case is R = F, in which case G is the trivial group. All relations become true in 
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the trivial group. Problems arise because computation in ¥ /R may be difficult. But because 
generators and relations allow efficient computation in many cases, they are a useful tool. 


7.11 THE TODD-COXETER ALGORITHM 


The Todd-Coxeter Algorithm, which is described in this section, is an amazing method for 
determining the operation of a finite group G on the set of cosets of a subgroup H. 
In order to compute, both G and H must be given explicitly. So we consider a group 


(7.11.1) G =6x1, 000, Xm Tt). +05 TED 


presented by generators and relations, as in the previous section. 
We also assume that the subgroup H of G is given explicitly, by a set of words 


(7.11.2) (Aiscece tts] 


in the free group F, whose images in G generate H. 

The algorithm proceeds by constructing some tables that become easier to read when 
one works with right cosets Hg. The group G operates by right multiplication on the set of 
right cosets, and this changes the order of composition of operations. A product gh acts by 
right multiplication as “‘first multiply by g, then by h”. Similarly, when we want permutations 
to operate on the right, we must read a product this way: 


first do this then this 


(234) 0 (123) = (12)G4). 


The following rules suffice to determine the operation of G on the right cosets: 


Rules 7.11.3 


1. The operation of each generator is a permutation. 
2. The relations operate trivially: they fix every coset. 
3. The generators of H fix the coset [H]. 

4. The operation is transitive. 


The first rule follows from the fact that group elements are invertible, and the second one 
reflects the fact that the relations represent the identity element of G. Rules 3 and 4 are 
special properties of the operation on cosets. 

When applying these rules, the cosets are usually denoted by indices 1, 2, 3,..., with1 
standing for the coset [H]. At the start, one doesn’t know how many indices will be needed; 
new ones are added as necessary. 

We begin with a simple example, in which we replace y> by y? in the relations (7.10.9). 


Example 7.11.4 Let G be the group <x, y, z | x3, y’, z. XN, and let H be the cyclic 
subgroup <z> generated by z. First, Rule 3 tells us that z sends 1 to itself, 1 4, 1. This 
exhausts the information in Rule 3, so Rules 1 and 2 take over. Rule 4 will only appear 
implicitly. 

Nothing we have done up to now tells us what x does to the index 1. In such a case, 
the procedure is simply to assign a new index, 1 > 2. (Since 1 stands for the coset [ H], the 
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index 2 stands for [Hx], but it is best to ignore this.) Continuing, we don’t know where x 


sends the index 2, so we assign a third index, 2 + 3.Then1 Ee 3. 

What we have so far is a partial operation, meaning that the operations of some 
generators on some indices have been assigned. It is helpful to keep track of the partial 
operation as one goes along. The partial operation that we have so far is 


z=()--- and x=(123.-.. 


There is no closing parenthesis for the partial operation of x because we haven’t determined 
the index to which x sends 3. 

Rule 2 now comes into play. It tells us that because x” is a relation, it fixes every index. 
Since x? sends 1 to 3, x must send 3 back to 1. It is customary to sum this information up in 
a table that exhibits the operation of x on the indices: 


3 


x xX x 

12341 
The relation xxx appears on top, and Rule 2 is reflected in the fact that the same inde? 1 
appears at both ends. We have now determined the partial operation 


x = (123)---, 
except that we don’t yet know whether or not the indices 1, 2, 3 represent distinct cosets. 


Next, we ask for the operation of y on the index 1. Again, we don’t know it, so we 
assign a new index: 1 2, 4. Rule 2 applies again. Since y” is a relation, y must send 4 back to 
1. This is exhibited in the table 

, yey 
141° 

For review, we have now determined the entries in the table below. The four defining 
relations appear on top. 


so y=(14)--- 


1 2 3 #1 1 4 1 i Ai 1 2 1 


The missing entry in the table for xyz is 1. This follows from the fact that z acts as a 
permutation that fixes the index 1. Entering 1 into the table, we see that 2 2, 1. But we also 


have 4 + 1. Therefore 4 = 2. We replace 4 by 2 and continue constructing a table. 
The entries below have been determined: 


X O28 <x& -y sy Zi 3% XPV 
1 2 3 1 1 2 1 1 1 1 1 2 1 1 
2 3 1 2 2 1 2 2 2 2 3 2 
3 1 2 3 3 3 3 3 3 1 2 3 


The third row of the table for xyz shows that 2 +, 3, and this determines the rest of the 
table. There are three indices, and the complete operation is 


x = (123), y= (12), z= (23). 
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At the end of the section, we will show that this is indeed the permutation representation 
defined by the operation of G on the cosets of H. Oo 


What such a table tells us depends on the particular case. It will always tell us the 
number of cosets, the index [G : H], which will be equal to the number of distinct indices: 
3 in our example. It may also tell us something about the order of the generators. In our 
example, we are given the relation z* = 1, so the order of z must be 1 or 2. But z acts on 
indices as the transposition (23), and this tells us that we can’t have z = 1. So the order of z 
is 2, and |#7| = 2. The counting formula |G| = |H|[G: H] shows that G has order 2-3 = 6. 
The three permutations shown above generate the symmetric group S3, so the permutation 
representation G — S3 defined by this operation is an isomorphism. 

If one takes for H the trivial subgroup {1}, the cosets correspond bijectively to the 
group elements, and the permutation representation determines G completely. The cost of 
doing this is that there will be many indices. In other cases, the permutation representation 
may not suffice to determine the order of G. 

We'll compute two more examples. 


Example 7.11.5 We show that the relations (7.10.9) form a complete set of relations for 
the tetrahedral group. The verification is simplified a little if one uses the relation xyz = 1 
to eliminate the generator z. Since z? = 1, that relation implies that xy = z~! = z. The 
remaining elements x, y suffice to generate 7. So we substitute z = xy into z*, and replace 
the relation z? by x yxy. The relations become 


(7.11.6) e=1y=1,xyxy=1. 


These relations among x and y are equivalent to the relations (7.10.9) among x, y, and z, so 
they holdin T. 

Let G denote the group <x, y|x°, y°, xyxy >. Corollary (7.10.14) gives us a homo- 
morphism y:G — 7. To show that (7.11.6) are defining relations for 7, we show that w is 
bijective. Since x and y generate 7, y is surjective. So it suffices to show that the order of G 
is equal to the order of 7, which is 12. 

We choose the subgroup H = <x. This subgroup has order 1 or 3 because x” is one of 
the relations. If we show that H has order 3 and that the index of H in G is 4, it will follow 
that G has order 12, and we will be done. Here is the resulting table. To fill it in, work from 
both ends of the relations. 


3 


x xX XxX y y yp _ xX y x y 
1 1 1 1 1 2 3 1 1 1 2 3 1 
2 3 4 2 2 3 1 2 2 3 1 1 2 
3 4 2 3 3 t 2 3 3 4 4 2 3 
4 2 3 4 4 4 4 4 4 2 3 4 4 


The permutation representation is 
(7.11.7) x = (234), y= (123). 


Since there are four indices, the index of H is 4. Also, x does have order 3, not 1, because 
the permutation associated to x has order 3. The order of G is 12, as predicted. 
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Incidentally, we see that T is isomorphic to the alternating group A4, because the 
permutations (7.11.7) generate that group. Oo 


Example 7.11.8 We modify the relations (7.10.9) slightly, to illustrate how ‘“‘bad”’ relations 
may collapse the group. Let G be the group <x, y|x°, y’, yxyxy>, and let H be the 
subgroup < y>. Here is a start for a table: 


2 2 2 2 2 3 1 1 2 =2 


In the table for yx yxy, the first three entries in the first row are determined by working from 
the left, and the last three by working from the right. That row shows that 2 , 3. The second 


row is determined by working from the left, and it shows that 2 4 2.802 =3. Looking 
at the table for xx, we see that then 2 = 1. There is just one index left, so one coset, and 
consequently H = G. The group G is generated by y. It is a cyclic group of order 3: O 


Warning: Care is essential when constructing such a table. Any mistake will cause the 
operation to collapse. 


In our examples, we took for H the subgroup generated by one of the generators of 
G.If H is generated by a word h, one can introduce a new generator u and the new relation 
uh =1 (ie., u = h). Then G (7.11.1) is isomorphic to the group 


KXqe..- Xm Ul... .5 1k URS, 


and H becomes the subgroup generated by u. If H has several generators, we do this for 
each of them. 


We now address the question of why the procedure we have described determines the 
operation on cosets. A formal proof of this fact is not possible without first defining the 
algorithm formally, and we have not done this. We will discuss the question informally. (See 
[Todd-Coxeter] for a more complete discussion.) We describe the procedure this way: At a 
given stage of the computation, we will have some set I of indices, and a partial operation on 
I, the operation of some generators on some indices, will have been determined. A partial 
operation need not be consistent with Rules 1, 2, and 3, but it should be transitive; that is, 
all indices should be in the “‘partial orbit” of 1. This is where Rule 4 comes in. It tells us not 
to introduce any indices that we don’t need. In the starting position, I is the set {1} of one 
element, and no operations have been assigned. 

At any stage there are two possible steps: 


(7.11.9) (i) We may equate two indices i andj if the the rules tell us that they are equal, or 
(ii) we may choose a generator x and an index isuch that ix has not been determined, and 
define ix = j, where j is a new index. 


We never equate indices unless their equality is implied by the rules. 
We stop the process when an operation has been determined that is consistent with 
the rules. There are two questions to ask: First, will this procedure terminate? Second, if it 
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terminates, is the operation the right one? The answer to both questions is yes. It can be 
shown that the process does terminate, provided that the group G is finite, and that preference 
is given to steps of type (i). We will not prove this. More important for applications is the 
fact that, if the process terminates, the resulting permutation representation is the right one. 


Theorem 7.11.10 Suppose that a finite number of repetitions of steps (i) and (ii) yields a 
consistent table compatible with the rules (7.11.3). Then the table defines a permutation 
representation that, by suitable numbering, is the representation on the right cosets of H inG. 


Proof. Say that the group is G = <X1,..., Xn|71,.--, 7%>, and let I* denote the final set of 
indices. For each generator x;, the table determines a permutation of the indices, and the 
relations operate trivially. Corollary 7.10.14 gives us a homomorphism from G to the group 
of permutations of I*, and therefore an operation, on the right, of G on I* (see Proposition 
6.11.2). Provided that we have followed the rules, the table will show that the operation of 
G is transitive, and that the subgroup H fixes the index 1. 

Let C denote the set of right cosets of H. We prove the proposition by defining a 
bijective map g* :I* > C from I* to C that is compatible with the operations of the group on 
the two sets. We define g* inductively, by defining at each stage a map g:I > C from the 
set of indices determined at that stage to C, compatible with the partial operation on I that 
has been determined. To start, gg: {1} > C sends 1~[H]. Suppose that g:I > C has been 
defined, and let I’ be the result of applying one of the steps (7.11.9) to I. 

In case of step (ii), there is no difficulty in extending g to a map g’:I’ > C. Say that 
(i) is the coset [Hg], and that the operation of a generator x on i has been defined to be 
a new index, say ix = j. Then we define g’(j) = [Hgx], and we define g’(k) = p(k) for all 
other indices. 

Next, suppose that we use step (i) to equate the indices i and j, so that I is collapsed to 
form the new index set I’. The next lemma allows us to define the map g’:I’ > C. 


Lemma 7.11.11 Suppose that a map g:I —> Cis given, compatible with a partial operation on 
I. Let i and j be indices in I, and suppose that one of the rules forces i = j. Then g(i) = g(j). 


Proof. Thisis true because, as we have remarked before, the operation on cosets does satisfy 
the rules. O 


The surjectivity of the map ¢ follows from the fact that the operation of the group on 
the set C of right cosets is transitive. As we now verify, the injectivity follows from the facts 
that the stabilizer of the coset [H] is the subgroup H, and that the stabilizer of the index 1 
contains H. Let i and j be indices. Since the operation on I* is transitive, i = la for some 
group element a, and then g(i) = g(a = [Ha]. Similarly, if j = 1b, then g(j) = [Hb]. 
Suppose that g(i) = g(j), i.e., that Ha = Hb. Then H = Hba™', so ba"! is an element of 
H. Since H stabilizes the index 1, 1 = 1ba™! and i=la=1b= j. O 


The method of postulating what we want has many advantages; 
they are the same as the advantages of theft over honest toil. 


—Bertrand Russell 
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EXERCISES 


Section 1 Cayley’s Theorem 


1.1. 
1.2. 


Does the rule g * x = xg7! define an operation of G on G? 


Let H bea subgroup of a group G. Describe the orbits for the operation of H on G by 
left multiplication. 


Section 2 The Class Equation 


21. 


2.2. 


2.3. 


2.4, 


2.5. 


2.6. 
2.7. 


2.8. 
2.9, 


2.10. 


2.11. 


Determine the centralizer and the order of the conjugacy class of 


i in GL2(F3), (b) the matrix i ,| in GLa(Fs). 


A group of order 21 contains a conjugacy class C(x) of order 3. Whatis the order of x in 
the group? 


(a) the matrix * 


A group G of order 12 contains a conjugacy class of order 4. Prove that the center of G 
is trivial. 

Let G be a group, and let g be the nth power map: g(x) = x”. What can be said about 
how ¢ acts on conjugacy classes? 


y 

1 x. 
the conjugacy classes in G, and sketch them in the (x, y)-plane. 

Determine the conjugacy classes in the group M of isometries of the plane. 


Let G be the group of matrices of the form a3 where x, y € Rand x >0. Determine 


Rule out as many as you can, as class equations for a group of order 10: 
141414+2+5, 14242+5, 1424344, 14142424242. 
Determine the possible class equations of nonabelian groups of order (a) 8, (b) 21. 


Determine the class equation for the following groups: (a) the quaternion group, (b) D4, 
(c) Ds, (d) the subgroup of GL2(F3) of invertible upper triangular matrices. 


(a) Let A be an element of SO3 that represents a rotation with angle 2. Describe the 
centralizer of A geometrically. 


(b) Determine the centralizer of the reflection r about the e;-axis in the group M of 
isometries of the plane. 


Determine the centralizer in GL3(R) of each matrix: 


are al a ee. 


Determine all finite groups that contain at most three conjugacy classes. 


Let N be a normal subgroup of a group G. Suppose that || = 5 and that |G| is an odd 
integer. Prove that N is contained in the center of G. 


. The class equation of a group Gis1+44545+5. 


(a) Does G have a subgroup of order 5? If so, isit a normal subgroup? 
(b) Does G have a subgroup of order 4? If so, is it a normal subgroup? 
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2.15. Verify the class equation (7.2.10) of SL2(F3). 


2.16. Let g:G — G’ bea surjective group homomorphism, let C denote the conjugacy class of 
an element x of G, and let C’ denote the conjugacy class in G’ of its image g(x). Prove 
that g maps C surjectively to C’, and that |C"| divides |C]. 


2.17. Use the class equation to show that a group of order pq, with p and q prime, contains an 
element of order p. 


2.18. Which pairs of matrices Oe ot 


of 1 j k a are conjugate elements of (a) GL, (R), 
(b) SL,(R)? 


d 


Section3 p-Groups 


3.1. Prove the Fixed Point Theorem (7.3.2). 
3.2. Let Z be the center of a group G. Prove that if G/Z is a cyclic group, then G is abelian, 
and therefore G = Z. 


3.3. A nonabelian group G has order p*, where p is prime. 


(a) What are the possible orders of the center Z? 
(b) Let x be an element of G that isn’t in Z. What is the order of its centralizer Z(x)? 
(c) What are the possible class equations for G? 


3.4. Classify groups of order 8. 


Section4 The Class Equation of the Icosahedral Group 


4.1. The icosahedral group operates on the set of five inscribed cubes in the dodecahedron. 
Determine the stabilizer of one of the cubes. 


4.2. Is As the only proper normal subgroup of S5? 
4,3. What is the centralizer of an element of order 2 of the icosahedral group J? 


4.4. (a) Determine the class equation of the tetrahedral group T. 
(b) Prove that 7 has a normal subgroup of order 4, and no subgroup of order 6. 
4.5. (a) Determine the class equation of the octahedral group O. 
(b) This group contains two proper normal subgroups. Find them, show that they are 
normal, and show that there are no others. 
4.6. (a) Prove that the tetrahedral group T is isomorphic to the alternating group Aq, and 
that the octahedral group O is isomorphic to the symmetric group S4. 
Hint: Find sets of four elements on which the groups operate. 
(b) Two tetrahedra can be inscribed into a cube C, each one using half the vertices. 
Relate this to the inclusion Aq C Sq. 


4.7, Let G be a group of order n that operates nontrivially on a set of order r. Prove that if 
n > r!, then G has a proper normal subgroup. 
4.8. (a) Suppose that the centralizer Z(x) of a group element x has order 4. What can be 
said about the center of the group? 
(b) Suppose that the conjugacy class C(y) of an element y has order 4. What can be said 
about the center of the group? 
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4.9. Let x be an element of a group G, not the identity, whose centralizer Z (x) has order pq, 
where p and q are primes. Prove that Z(x) is abelian. 


Section5 Conjugation in the Symmetric Group 


5.1. (a) Prove that the transpositions (12), (23),...,(m—1,m) generate the symmetric 
group Sn. 
(b) How many transpositions are needed to write the cycle (123---n)? 
(c) Prove that the cycles (12---m) and (12) generate the symmetric group Sy. 


5.2. What is the centralizer of the element (12) in $5? 
5.3. Determine the orders of the elements of the symmetric group 57. 


5.4. Describe the centralizer Z(c) of the permutation 0 = (153)(246) in the symmetric 
group 57, and compute the orders of Z(a) and of C(o). 


5.5. Let p and qg be permutations. Prove that the products pg and qp have cycles of equal 
sizes. 

5.6. Find all subgroups of S4 of order 4, and decide which ones are normal. 

5.7. Prove that An is the only subgroup of Sp» of index 2. 

5.8. ‘Determine the integers n such that there is a surjective homomorphism from the 
symmetric group Sy to S,_-1. 

5.9. Let g be a 3-cycle in S,. How many even permutations pare there such that pgp! = q? 

5.10. Verify formulas (7.5.2) and (7.5.3) for the class equations of S4 and Ss, and determine 

the centralizer of a representative element in each conjugacy class. 


5.11. (a) Let C be the conjugacy class of an even permutation p in S,. Show that C is either 
a conjugacy class in An, or else the union of two conjugacy classes in A, of equal 
order. Explain how to decide which case occurs in terms of the centralizer of p. 


(b) Determine the class equations of Aq and As. 
(c) One may also decompose the conjugacy classes of permutations of odd order into 
An-orbits. Describe this decomposition. 


5.12. Determine the class equations of Sg and Ag. 


Section6 Normalizers 


6.1. Prove that the subgroup B of invertible upper triangular matrices in G Ly (R) is conjugate 
to the subgroup L of invertible lower triangular matrices. 


6.2. Let B be the subgroup of G = GL,,(C) of invertible upper triangular matrices, and 
let U C B be the set of upper triangular matrices with diagonal entries 1. Prove that 
B= N(U) and that B = N(B). 
*6.3. Let P denote the subgroup of GL, (R) consisting of the permutation matrices. Determine 
the normalizer N(P). 


6.4. Let H be a normal subgroup of prime order p in a finite group G. Suppose that p 
is the smallest prime that divides the order of G. Prove that H is in the 
center Z(G). 


Suggested by Ivan Borsenko. 
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6.5. Let p be a prime integer and let G be a p-group. Let AH be a proper subgroup of G. 
Prove that the normalizer N( 2) of H is strictly larger than A, and that His contained 
in a normal subgroup of index p. 


*6.6. Let H be a proper subgroup of a finite group G. Prove: 


(a) The group G is not the union of the conjugate subgroups of H. 
(b) There is a conjugacy class C that is disjoint from H. 


Section 7 The Sylow Theorems 


7.1. Let n = p®m, as in (4.5.1), and let N be the number of subsets of order p® in a set of 
order n. Determine the congruence class of N modulo p. 


7.2. Let G1 C G2 be groups whose orders are divisible by p,and let H; bea Sylow p-subgroup 
of G1. Prove that there is a Sylow p-subgroup H7 of G2 such that H; = H2N G4. 


7.3. How many elements of order 5 might be contained in a group of order 20? 
7.4. (a) Prove that no simple group has order pq, where p and q are prime. 
(b) Prove that no simple group has order pq, where p and q are prime. 
7.5. Find Sylow 2-subgroups of the following groups: (a) Dio, (b) 7, (c) O, (d) J. 
7.6. Exhibit a subgroup of the symmetric group S7 that is a nonabelian group of order 21. 


7.7. Letn = pm bean integer that is divisible exactly once by p, and let G be a group of order 
n. Let H be a Sylow p-subgroup of G, and let S be the set of all Sylow p-subgroups. 
Explain how S decomposes into H-orbits. 


*7.8. Compute the order of GL, (Fp). Find a Sylow p-subgroup of GL, (Fp), and determine 
the number of Sylow p-subgroups. 


7.9. Classify groups of order (a) 33, (b) 18, (c) 20, (d) 30. 
7.10. Prove that the only simple groups of order <60 are the groups of prime order. 


Section 8 The Groups of Order 12 


8.1. Which of the groups of order 12 described in Theorem 7.8.1 is isomorphic to $3 C2? 


8.2. (a) Determine the smallest integer n such that the symmetric group S, contains a 
subgroup isomorphic to the group (7.8.2). 
(b) Finda subgroup of SL2(is) that is isomorphic to that group. 


8.3. Determine the class equations of the groups of order 12. 
8.4. Prove that a group of order n = 2p, where p is prime, is either cyclic or dihedral. 
8.5. Let G be a nonabelian group of order 28 whose sylow 2 subgroups are cyclic. 

(a) Determine the numbers of sylow 2 - subgroups and of sylow 7 - subgroups. 


(b) Prove that there is at most one isomorphism class of such groups. 
(c) Determine the numbers of elements of each order, and the class equation of G. 


8.6. 
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Let G be a group of order 55. 


(a) Prove that G is generated by two elements x and y, with the relations x!! = 1, 
y =1, yxy! =x" for somer,1 <r< Il. 

(b) Decide which values of r are possible. 

(c) Prove that there are two isomorphism classes of groups of order 55. 


Section9 The Free Group 


9.1. 


9.2. 


Let F be the free group on {x, y}. Prove that the three elements u = x*, v = y’, and 
Z = xy generate a subgroup isomorphic to the free group on uy, v, and z. 


We may define a closed word in S’ to be the oriented loop obtained by joining the ends 
of a word. Reading counterclockwise, 


a c 
bbd 


is a closed word. Establish a bijective correspondence between reduced closed words and 
conjugacy classes in the free group. 


Section 10 Generators and Relations 


10.1. Prove the mapping properties of free groups and of quotient groups. 


10.2. 


10.3. 


10.4. 


10.5. 


10.6. 


10.7. 


Let g:G — G’ be a surjective group homomorphism. Let S be a subset of G whose 
image y(S) generates G’, and let T be a set of generators of kerg. Prove that $ UT 
generates G. 


Can every finite group G be presented by a finite set of generators and a finite set of 
relations? 

The group G = <x, y;xyx7!y! is called a free abelian group. Prove a mapping 
property of this group: If u and v are elements of an abelian group A, there is a unique 
homomorphism g:G -> A such that p(x) = u, p(y) = v. 


2 


Prove that the group generated by x, y, z with the single relation yx yz“ = 1 is actually 


a free group. 


A subgroup H of a group G is characteristic if it is carried to itself by all automorphisms 
of G. 


(a) Prove that every characteristic subgroup is normal, and that the center Z is a 
characteristic subgroup. 

(b) Determine the normal subgroups and the characteristic subgroups of the quaternion 
group. 


The commutator subgroup C of a group G is the smallest subgroup that contains all 
commutators. Prove that the commutator subgroup is a characteristic subgroup (see 
Exercise 10.6), and that G/C is an abelian group. 
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10.8. 


10.9. 


10.10. 
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Determine the commutator subgroups (Exercise 10.7) of the following groups: 
(a) SO2, (b) O2, (c) the group M of isometries of the plane, (d) S,, (d) SO3. 
Let G denote the group of 3 X3 upper triangular matrices with diagonal entries equal to 1 


and with entries in the field F ,. For each prime p, determine the center, the commutator 
subgroup (Exercise 10.6), and the orders of the elements of G. 


Let F be the free group on x, y and let R be the smallest normal subgroup containing 
the commutator xyxl yl, 

(a) Show that x*y*x"7y? isin R. 

(b) Prove that R is the commutator subgroup (Exercise 10.7) of F. 


Section 11 The Todd-Coxeter Algorithm 


11.1. 
11.2. 


11.3. 


11.4. 


11.6. 


11.7. 


11.8. 


Complete the proof that the group given in Example 7.11.8 is cyclic of order 3. 
Use the Todd-Coxeter algorithm to show that the group defined by the relations (7.8.2) 
has order 12 and that the group defined by the relations (7.7.8) has order 21. 


Use the Todd-Coxeter Algorithm to analyze the group generated by two elements x, y, 
with the following relations. Determine the order of the group and identify the group if 
you can: 

QP =y=l,xyx= yxy, (b) x8 = y' =1, xyx = yxy, 

()x4=y=1,xyx= yxy, (d)xt=yf=x’y =1, 

(e) 8 =1,y =1, yxyxy=1, (28 =y = yxyxy=1, 

(g) x*=1,y=1,xy=y'x, (h) x’ =1,y=1, yx=x’y, 
@xtyxsylylysxt, My =Lxyxy=1. 

How is normality of a subgroup H of G reflected in the table that displays the operation 
on cosets? 


. Let G be the group generated by elements x, y, with relations x* = 1, y> = 1, x? = yxy. 


Prove that this group is trivial in two ways: using the Todd-Coxeter Algorithm, and 
working directly with the relations. 

A triangle group GP? isa group<x, y, Z| x?, y?, 2”, xyz>, where p < q < rare positive 
integers. In each case, prove that the triangle group is isomorphic to the group listed. 


(a) the dihedral group D,, when p, g, r = 2, 2, n, 
(b) the octahedral group, when p, q,r = 2, 3, 4, 
(c) the icosahedral group, when p, q, r = 2, 3,5. 


Let A denote an equilateral triangle, and let a, b, c denote the reflections of the plane 

about the three sides of A. Let x = ab, y = bc, z = ca. Prove that x, y, z generate a 

triangle group (Exercise 11.6). 

(a) Prove that the group G generated by elements x, y, z with relations x2 = y? = 79 = 
1, xyz = 1 has order 60. 

(b) Let H be the subgroup generated by x and zyz™!. Determine the permutation 
representation of G on G/ H, and identify H. 


(c) Prove that G is isomorphic to the alternating group As. 
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(d) Let K be the subgroup of G generated by x and yxz. Determine the permutation 
representation of G on G/K, and identify K. 


Miscellaneous Problems 


M.1. 


M.2. 


*M.3. 


*M.4. 


M.5. 


M.6. 


*M.8. 


M.9. 


Classify groups that are generated by two elements x and y of order 2. 
Hint: It will be convenient to make use of the element z = xy. 
With the presentation (6.4.3), determine the double cosets (see Exercise M.9) HgH of 


the subgroup H = (1, y} in the dihedral group D,,. Show that each double coset has 
either two or four elements. 


(a) Suppose that a group G operates transitively on a set S, and that #7 is the stabilizer 
of an element sg of S. Consider the operation of G on SX S defined by g(s1, 52) = 
(g51, 852). Establish a bijective correspondence between double cosets of H in G 
and G-orbits in SX S. 

(b) Work out the correspondence explicitly for the case that G is the dihedral group Ds 
and S is the set of vertices of a pentagon. 


(c) Work it out for the case that G = T and that S is the set of edges of a tetrahedron. 


Let H and K be subgroups of a group G, with H C K. Suppose that #7 is normal in K, 
and that K is normal in G. Is H normal in G? 


Let H and N be subgroups of a group G, and assume that N is a normal subgroup. 
(a) Determine the kernels of the restrictions of the canonical homomorphism 7:G > 


G/N to the subgroups H and HN. 


(b) Applying First Isomorphism Theorem to these restrictions, prove the Second Iso- 
morphism Theorem: H/(H 2 N) is isomorphic to (HN) /N. 


Let H and N be normal subgroups of a group G such that H > N. Let H = H/N and 
G=G/N. 


(a) Prove that H is a normal subgroup of G. 
(b) Use the composed homomorphism G > G > G/H to prove the 
Third Isomorphism Theorem: G/H is isomorphic to G/H. 


. *Let P1, P2 be permutations of the set S = (1, 2, ..., m7}, and let U; be the subset of S of 


indices that are not fixed by p;. Prove: 

(a) If Uj; NU2 = G, the commutator pi p2 pe Pp! is the identity. 

(b) If U,;NU2 contains exactly one element, the commutator p, p2 p;* ps is a three-cycle. 
Let H be a subgroup of a group G. Prove that the number of left cosets is equal to the 
number of right cosets also when G is an infinite group. 


Let x be an element, not the identity, of a group of odd order. Prove that the elements x 
and x“! are not conjugate. 


Suggested by Benedict Gross. 
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M.10. Let G be a finite group that operates transitively on a set S of order > 2. Show that G 
contains an element g that doesn’t fix any element of S. 


M.11. Determine the conjugacy classes of elements order 2 in GL2(Z). 
*M.12. (class equation of SL2) Many, though not all, conjugacy classes in SL2(F) contain 


matrices of the form A = 1 at 


(a) Determine the centralizers in SL>(F5) of the matrices A, for a = 0, 1, 2, 3, 4. 


(b) Determine the class equation of SL2(Fs). 


(c) How many solutions ofan equation of the form x? + axy+ y* =1inF p might there 
be? To analyze this, one can begin by setting y = Ax + 1. For most values of A there 
will be two solutions, one of which is x = 0, y= 1. 


(d) Determine the class equation of SL2(F p). 


CHAPTER 8 


Bilinear Forms 


| presume that to the uninitiated 
the formulae will appear cold and cheerless. 


—Benjamin Pierce 


8.1 BILINEAR FORMS 


The dot product (X - Y) = X'Y = x1y1 + ---+Xnyn on R” was discussed in Chapter 5. 
It is symmetric: (Y -X) = (X - Y), and positive definite. (X -X) > 0 for every X#0. We 
examine several analogues of dot product in this chapter. The most important ones are 
symmetric forms and Hermitian forms. All vector spaces in this chapter are assumed to be 
finite-dimensional. 

Let V be a real vector space. A bilinear form on V is a real-valued function of two 
vector variables - a map V X V->R. Given a pair v, w of vectors, the form returns a real 
number that will usually be denoted by (v, w). A bilinear form is required to be linear in 
each variable: 


(8.1.1) (rvy,wW1) =r(vy,W 1) and (vy + v2, wi) = (v1, W1) + (v2, W1) 
(vj, 7w1) =r(vi, Wi) and (vy, Wr + W2) = (V1, W1) + (V1, W2) 


for all v; and w; in V and all real numbers 7. Another way to say this is that the form is 
compatible with linear combinations in each variable: 


(8.1.2) (Soxiv;, w) = oxi(v;, w) 
(v, Pw; yj) = Lv, wily; 


for all vectors v; and w, and all real numbers x; and yj. (It is often convenient to bring 
scalars in the second variable out to the right side.) 
The form on R” defined by 


(8.1.3) (X,Y) = X'AY, 


where A is an Xn matrix, is an example of a bilinear form. The dot product is the case 
A =I, and when one is working with real column vectors, one always assumes that the form 
is dot product unless a different form has been specified. 
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If a basis B = (v1, ..., Un) of V is given, a bilinear form ( , ) can be related to a form 
of the type (8.1.3) by the matrix of the form. This matrix is simply A = (a;;), where 
(8.1.4) ij = (Uj, Vj). 
Proposition 8.1.5 Let ( , ) be a bilinear form on a vector space V, let B = (v1, ..., Un) bea 


basis of V, and let A be the matrix of the form with respect to that basis. If X and Y are the 
coordinate vectors of the vectors v and w, respectively, then 


(v, w) = X'AY. 


Proof. If v = BX and w = BY, then 


= (Sova, DO yi) = por Vi, Vj) Yj = 2 Fai) = X'AY. O 
; : 


A bilinear form is symmetric if (v, i = (w, v) for aii v and w in V, and skew- 
symmetric if (v, w) = -(w, v) for all v and w in V. When we refer to a symmetric form, we 
mean a bilinear symmetric form, and similarly, reference to a skew-symmetric form implies 
bilinearity. 


Lemma 8.1.6 


(a) Let A be ann Xn matrix. The form X‘AY is symmetric: X‘AY = Y'‘AX for all X and Y, 
if and only if the matrix A is symmetric: A‘ = 

(b) A bilinear form (, ) is symmetric if and only if its matrix with respect to an arbitrary 
basis is a symmetric matrix. 7 


The analogous statements are true when the word symmetric is replaced by skew-symmetric. 


Proof. (a) Assume that A = (a;;) is a symmetric matrix. Thinking of X ‘AY asa1X1 matrix, 
it is equal to its transpose. Then XtAY = (XtAY)! = Y'A'X = Y'AX. Thus the form is 
symmetric. To derive the other implication, we note that e;'Ae i = aij, while el Ae; =aj;i.In 
order for the form to be symmetric, we must have aj; = aj. 


(b) This follows from (a) because (v, w) = X'AY. L 


The effect of a change of basis on the matrix of a form is determined in the usual way. 


Proposition 8.1.7 Let ( , ) be a bilinear form on a real vector space V, and let A and A’ be 
the matrices of the form with respect to two bases B and B’. If P is the matrix of change of 
basis, so that B’ = BP, then 


A’ = P'AP. 


Proof. Let X and X’ be the coordinate vectors of a vector v with respect to the bases B and 
B’. Then v = BX = B’X’, and PX’ = X. With analogous notation, w = BY = B’Y’, 


(v, w) = X'AY = (PX’)'A(PY’) = X"(PLAP)Y’. 


This identifies P’ AP as the matrix of the form with respect to the basis B’. ‘O 
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Corollary 8.1.8 Let A be the matrix of a bilinear form with respect to a basis. The matrices 
that represent the same form with respect to different bases are the matrices P'AP, where P 
can be any invertible matrix. O 


Note: There is an important observation to be made here. When a basis is given, both linear 
operators and bilinear forms are described by matrices. It may be tempting to think that 
the theories of linear operators and of bilinear forms are equivalent in some way. They are 
not equivalent. When one makes a change of basis, the matrix of the bilinear form X‘AY 
changes to P'AP, while the matrix of the linear operator Y = AX changes to P"!AP. The 
matrices obtained with respect to the new basis will most often be different. O 


8.2) SYMMETRIC FORMS 


Let V be areal vector space. A symmetric form on V is positive definite if (v, v) > Ofor all 
nonzero vectors v, and positive semi-definite if (v, v) > 0 for all nonzero vectors v. Negative 
definite and negative semidefinite forms are defined analogously. Dot product is a symmetric, 
positive definite form on R”. 

A symmetric form that is not positive definite is called indefinite. The Lorentz form 


(8.2.1) (X,Y) = x1 yi + X2y2 + X33 — Kaya 


is an indefinite symmetric form on “space-time” R*, where x4 is the “time” coordinate, and 
the speed of light is normalized to 1. Its matrix with respect to the standard basis of R‘ is 


1 
(8.2.2) : 
-1 


As an introduction to the study of symmetric forms, we ask what happens to dot 
product when we change coordinates. The effect of the change of basis from the standard 
basis E to a new basis B’ is given by Proposition 8.1.7. If B’ = EP, the matrix / of dot product 
changes to A’ = P'/P = P'P, or in terms of the form, if PX’ = X and PY’ = Y, then 


(8.2.3) X'y =X"A’Y’, where A’ =P'P. 


If the change of basis is orthogonal, then P'P is the identity matrix, and (X - Y) = (X’- Y’). 
But under a general change of basis, the formula for dot product changes as indicated. 

This raises a question: Which of the bilinear forms X'AY are equivalent to dot product, 
in the sense that they represent dot product with respect to some basis of R”? Formula 
(8.2.3) gives a theoretical answer: 


Corollary 8.2.4 The matrices A that represent a form (X, Y) = X'AY equivalent to dot 
product are those that can be written as a product P'P, for some invertible matrix P. 0 


This answer won’t be satisfactory until we can decide which matrices A can be writ- 
ten as such a product. One condition that A must satisfy is very simple: It must be 
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symmetric, because P'P is always a symmetric matrix. Another condition comes from the 
fact that dot product is positive definite. 

In analogy with the terminology for symmetric forms, a symmetric real matrix A is 
called positive definite if X‘AX > 0 for all nonzero column vectors X. If the form X‘AY is 
equivalent to dot product, the matrix A will be positive definite. 

The two conditions, symmetry and positive definiteness, characterize matrices that 
represent dot product. 


Theorem 8.2.5 The following properties of a real nm Xn matrix A are equivalent: 


(i) The form X'AY represents dot product, with respect to some basis of R”. 
(ii) There is an invertible matrix P such that A = P'P. 
(iii) The matrix A is symmetric and positive definite. 


We have seen that (i) and (ii) are equivalent (Corollary 8.2.4) and that (i) implies (iii). 
We will prove that (iii) implies (i) in Section 8.4 (see (8.4.18)). 


8.3. HERMITIAN FORMS 


The most useful way to extend the concept of symmetric forms to complex vector spaces is 
to Hermitian forms. A Hermitian form on a complex vector space V isamap VX V > C, 
denoted by (v, w), that is conjugate linear in the first variable, linear in the second variable, 
and Hermitian symmetric: 


(8.3.1) (cU,, W1) = Evy, w1) and = (vy + v2, Wi) = (Vj, Wi) + (v2, W1) 
c(vy, Wy) and (vy, Wy +W2) = (V1, Wi) + (V1, W2) 


(wi,¥1) = (v1, wy) 
for all v; and w; in V, and all complex numbers c, where the overline denotes complex 
conjugation. As with bilinear forms (8.1.2), this condition can be expressed in terms of linear 
combinations in the variables: 


(8.3.2) (Oxy, w) = DX; (vj, w) 
(v, Dw yj) = Lv, wy) yj 
for any vectors vj and w; and any complex numbers x; and y;. Because of Hermitian 
symmetry, (v, v) = (v, v), and therefore (v, v) is a real number, for all vectors v. 
The standard Hermitian form on C” is the form 


(8.3.3) (X,Y) =X*Y =Xpy1 +--+: +Xnyn, 


where the notation X* stands for the conjugate transpose (X1, ..., Xn) of X = (%1,....,Xn)!. 
When working with C”, one always assumes that the form is the standard Hermitian form, 
unless another form has been specified. 

The reason that the complication caused by complex conjugation is introduced is that 
(X, X) becomes a positive real number for every nonzero complex vector X. If we use 
the bijective correspondence of complex n-dimensional vectors with real 2n-dimensional 
vectors, by 
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(8.3.4) (x1) 02-5 %n)! <> (a1, Bi, 5 ns ba)’, 
where x, = ay + byi, then X, = a, — byi and 
(X,X) =Xyxp t+ + Xpxn =at + bit+---+a2 + b?2. 


Thus (X, X) is the square length of the corresponding real vector, a positive real number. 

For arbitrary vectors X and Y, the symmetry property of dot product is replaced by 
Hermitian symmetry: (Y,X) = (X, Y). Bear in mind that when X# Y, (X, Y) is likely to 
be a complex number, whereas dot product of the corresponding real vectors would be 
real. Though elements of C” correspond bijectively to elements of R*”, as above, these two 
vector spaces aren’t equivalent, because scalar multiplication by a complex number isn’t 
defined on R”, 


The adjoint A* of acomplex matrix A = (a;;) is the complex conjugate of the transpose 
matrix A‘, a notation that was used above for column vectors. So the i, j entry of A* is Gj. 


Lae) 1. 2 
For example,| 5 j =[it, at 


Here are some rules for computing with adjoint matrices: 

(8.3.5) (cA)* =CA*, (A+B)*=A*+B*, (AB)*=B*A*, A**=A. 
A square matrix A is Hermitian (or self-adjoint) if 

(8.3.6) A* =A, 


The entries of a Hermitian matrix A satisfy the relation aj; = @;;. Its diagonal entries are 
real and the entries below the diagonal are the complex conjugates of those above it: 


(8.3.7) _ A= oe > reER, aij € C. 
Gij Tn 


i 


‘1 is a Hermitian matrix. A real matrix is Hermitian if and only if it is 


For example, E 


symmetric. 

The matrix of a Hermitian form with respect to a basis B = (v1, ..., Un) is defined as 
for bilinear forms. It is A = (a;;), where a;; = (vj, vj). The matrix of the standard Hermitian 
form on C” is the identity matrix. 


Proposition 8.3.8 Let A be the matrix of a Hermitian form ( , ) on a complex vector space 
V, with respect to a basis B. If X and Y are the coordinate vectors of the vectors v 
and w, respectively, then (v, w) = X*AY and A is a Hermitian matrix. Conversely, if A 
is a Hermitian matrix, then the form on C” defined by (X, Y) = X*AY is a Hermitian 
form. 


The proof is analogous to that of Proposition 8.1.5. O 
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Recall that if the form is Hermitian, (v, v) is a real number. A Hermitian form is 
positive definite if (v, v) is positive for every nonzero vector v, and a Hermitian matrix 
is positive definite if X*AX is positive for every nonzero complex column vector X. A 
Hermitian form is positive definite if and only if its matrix with respect to an arbitrary basis 
is positive definite. 

The rule for a change of basis B’ = BP in the matrix of a Hermitian form is determined, 
as usual, by substituting PX’ = X and PY’ = Y: 


X*AY = (PX')*A(PY') = X""(P*AP)Y’. 
The matrix of the form with respect to the new basis is 
(8.3.9) A’ = P*AP. 


Corollary 8.3.10 


(a) Let A be the matrix of a Hermitian form with respect to a basis. The matrices that 
represent the same form with respect to different bases are those of the form A’ = P*AP, 
where P can be any invertible complex matrix. 


(b) A change of basis B’ = EP in C” changes the standard Hermitian form X*Y to X"*A'Y’, 
where A’ = P*P. oO 


The next theorem gives the first of the many special properties of Hermitian matrices. 


Theorem 8.3.11 The eigenvalues, the trace, and the determinant of a Hermitian matrix A 
are real numbers. 


Proof. Since the trace and determinant can be expressed in terms of the eigenvalues, it 
suffices to show that the eigenvalues of a Hermitian matrix A are real. Let X be an eigenvector 
of A with eigenvalue 4. Then 


X*AX = X*(AX) = X*(AX) = AX*X. 
We note that (AX)* = A.X*. Since A* = A, 
X*AX = (X*A)X = (X*A*)X = (AX)*X = (AX)*X = AX*X. 


So AX*X = AX*X, Since X*X is a positive real number, it is not zero. Therefore A = A, 
which means that A is real. | 


Please go over this proof carefully. It is simple, but so tricky that it seems hard to trust. Here 
is a Startling corollary: 


Corollary 8.3.12 The eigenvalues of a real symmetric matrix are real numbers. 


Proof. When a real symmetric matrix is regarded as a complex matrix, it is Hermitian, so 
the corollary follows from the theorem. O 


This corollary would be difficult to prove without going over to complex matrices, though it 
can be checked directly for a real symmetric 2 <2 matrix. 
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A matrix P such that 
(8.3.13) P*P =I, (or pre Fh) 


is called a unitary matrix. A matrix P is unitary if and only if its columns P,,..., Py, are 
orthonormal with respect to the standard Hermitian form, i.e., if and only if PFP; = 1 and 


pie cot As ee a (a | eee 
P; P, = 0 when i+ j. For example, the matrix 71 k | is unitary. 


The unitary matrices form a subgroup of the complex general linear group called the 
unitary group. It is denoted by U,: 


(8.3.14) Un ={(P| P*P=1). 


We have seen that a change of basis in R” preserves dot product if and only if the 
change of basis matrix is orthogonal 5.1.14. Similarly, a change of basis in C” preserves 
the standard Hermitian form X*Y if and only if the change of basis matrix is unitary. (see 
(8.3.10)(b)). 


8.4 ORTHOGONALITY 


In this section we describe, at the same time, symmetric (bilinear) forms on a real vector 
space and Hermitian forms on a complex vector space. Throughout the section, we assume 
that we are given either a finite-dimensional real vector space V with a symmetric form, 
or a finite-dimensional complex vector space V with a Hermitian form. We won’t assume 
that the given form is positive definite. Reference to a symmetric form indicates that V is a 
real vector space, while reference to a Hermitian form indicates that V is a complex vector 
space. Though everything we do applies to both cases, it may be best for you to think of a 
symmetric form on a real vector space when reading this for the first time. 

In order to include Hermitian forms, bars will have to be put over some symbols. Since 
complex conjugation is the identity operation on the real numbers, we can ignore bars when 
considering symmetric forms. Also, the adjoint of a real matrix is equal to its transpose. 
When a matrix A is real, A* is the transpose of A. 


We assume given a symmetric or Hermitian form on a finite-dimensional vector space 
V. The basic concept used to study the form is orthogonality. 


e Two vectors v and w are orthogonal (written vw) if 
(v, w) =0. 


This extends the definition given before when the form is dot product. Note that vw if and 
only if wv. 

What orthogonality of real vectors means geometrically depends on the form and also 
on a basis. One peculiar thing is that, when the form is indefinite, a nonzero vector v may 
be self-orthogonal: (v, v) = 0. Rather than trying to understand the geometric meaning of 
orthogonality for each symmetric form, it is best to work algebraically with the definition of 
orthogonality, (v, w) = 0, and let it go at that. 
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If W is a subspace of V, we may restrict the form on V to W, which means simply that 
we take the same form but look at it only when the vectors are in W. It is obvious that if the 
form on V is symmetric, Hermitian, or positive definite, then its restriction to W will have 
the same property. 


« The orthogonal space to a subspace W of V, often denoted by W+, is the subspace of 
vectors v that are orthogonal to every vector in W, or symbolically, such that v1 W: 


(8.4.1) Wt = {ve V| (v, w) =0 forall win W}. 


* An orthogonal basis B = (v,..., Un) of V is a basis whose vectors are mutually 
orthogonal: (v;, v;) = 0 for all indices i and j with i + j. The matrix of the form with respect 
to an orthogonal basis will be a diagonal matrix, and the form will be nondegenerate (see 
below) if and only if the diagonal entries (v;, vi) of the matrix are nonzero (see (8.4.4)(b)). 


e A null vector v in V is a vector orthogonal to every vector in V, and the nullspace N of 
the form is the set of null vectors. The nullspace can be described as the orthogonal space to 
the whole space V: 


N={vlviv}=Vve. 


e The form on V is nondegenerate if its nullspace is the zero space {0}. This means that 
for every nonzero vector v, there is a vector v’ such that (v, v’)#0. A form that isn’t 
nondegenerate is degenerate. The most interesting forms are nondegenerate. 


¢ The form on V is nondegenerate on a subspace W if its restriction to W is a nondegenerate 
form, which means that for every nonzero vector w in W, there is a vector w’, also in W, 
such that (w, w’) #0. A form may be degenerate on a subspace, though it is nondegenerate 
on the whole space, and vice versa. 


Lemma8.4.2_ The form is nondegenerate on W if and only if WA W+ = {0}. O 


There is an important criterion for equality of vectors in terms of a nondegenerate 
form. 


Proposition 8.4.3 Let ( , ) be a nondegenerate symmetric or Hermitian form on V, and let 
vand v’ be vectors in V. If (v, w) = (v’, w) forall vectors w in V, then v = v’. 


Proof. If (v, w) = (v’, w), then v — v’ is orthogonal to w. If this is true for all w in V, then 
v — v’ is a null vector, and because the form is nondegenerate, v — uv’ = 0. O 


Proposition 8.4.4 Let ( , ) be a symmetric form on a real vector space or a Hermitian form 
on a complex vector space, and let A be its matrix with respect to a basis. 


(a) Avector vis a null vector if and only if its coordinate vector Y solves the homogeneous 
equation AY = 0. 
(b) The form is nondegenerate if and only if the matrix A is invertible. 


Proof. Via the basis, the form corresponds to the form X*AY, so we may as well work with 
that form. If Y is a vector such that AY = 0, then X*AY =0 for all _X, which means that Y 
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is orthogonal to every vector, i.e., it is a null vector. Conversely, if AY #0, then AY has a 
nonzero coordinate. The matrix product e*AY picks out the th coordinate of AY. So one of 
those products is not zero, and therefore Y is not a null vector. This proves (a). Because A is 
invertible if and only if the equation AY = 0 has no nontrivial solution, (b) follows. oO 


Theorem 8.4.5 Let ( , ) be a symmetric form on a real vector space V or a Hermitian form 
on a complex vector space V, and let W be a subspace of V. 


(a) The form is nondegenerate on W if and only if V is the direct sum W ® W+. 
(b) If the form is nondegenerate on V and on W, then it is nondegenerate on W+. 


When a vector space V is a direct sum W; © --- ® W, and W; is orthogonal to W; for 
i# j, V is said to be the orthogonal sum of the subspaces. The theorem asserts that if the 
form is nondegenerate on W, then V is the orthogonal sum of W and W+. 


Proof of Theorem 8.4.5. (a) The conditions for a direct sum are WM W+ = {0} and 
V = W+ W# (3.6.6)(c). The first condition simply restates the hypothesis that the form 
be nondegenerate on the subspace. So if V is the direct sum, the form is nondegenerate. 
We must show that if the form is nondegenerate on W, then every vector v in V can be 
expressed as asum v = w+u, with w in W and u in W-. 


We extend a basis (w1,..., wx) of W toa basis B= (Wj, ..., Wei U1, .--, Un_K) Of 

V, and we write the matrix of the form with respect to this basis in block form 
A B 

(8.4.6) M= E a ; 

where A is the upper left k x k submatrix. 

The entries of the block A are (w;, w;) for i, 7 = 1,...,k, so A is the matrix of the 
form restricted to W. Since the form is nondegenerate on W, A is invertible. The entries of 
the block B are (w;, vj) fori =1,...,k and j=1,...,n —k. If we can choose the vectors 
V1,...+, Un_z% SO that B becomes zero, those vectors will be orthogonal to the basis of W, 


so they will be in the orthogonal space W~. Then since B is a basis of V, it will follow that 
V = W+ W!, which is what we want to show. 


To achieve B = 0, we change basis using a matrix with a block form 


=i 2-3 
(8.4.7) Pi= E I ; 
where the block Q remains to be determined. The new basis B’ = BP will have the form 
(Wy1,.-., We vip Shes Vik): The basis of W will not change. The matrix of the form with 


respect to the new basis will be 


(8.4.8) ut = PMP =| by allie ae 4 ws i aere) 


We don’t need to compute the other entries. When we set Q = -A™!B, the upper right block 
of M’ becomes zero, as desired. 
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(b) Suppose that the form is nondegenerate on V and on W. (a) shows that V= W@W. 
If we choose a basis for V by appending bases for W and W+, the matrix of the form on V 
will be a diagonal block matrix, where the blocks are the matrices of the form restricted to 
W and to W-. The matrix of the form on V is invertible (8.4.4), so the blocks are invertible. 
It follows that the form is nondegenerate on W+. 0 


Lemma 8.4.9 If a symmetric or Hermitian form is not identically zero, there is a vector v in 
V such that (v, v) +0. 


Proof. If the form is not identically zero, there will be vectors x and y such that (x, y) is not 
zero. If the form is Hermitian, we replace y by cy where c is a nonzero complex number, to 
make (x, y) real and still not zero. Then (y, x) = (x, y). We expand: 


(x+y, x + y) = (x, xX) +2(x, y) + (y, y). 


Since the term 2(x, y) isn’t zero, at least one of the three other terms in the equation isn’t 
Zero. O 


Theorem 8.4.10 Let ( , ) be asymmetric form on a real vector space V or a Hermitian form 
on a complex vector space V. There exists an orthogonal basis for V. 


Proof. Case 1: The form is identically zero. Then every basis is orthogonal. 


Case 2: The form is not identically zero. By induction on dimension, we may assume that 
there is an orthogonal basis for the restriction of the form to any proper subspace of V. 
We apply Lemma 8.4.9 and choose a vector v; with (vj, v;) #0 as the first vector in our 
basis. Let W be the span of (v;). The matrix of the form restricted to W is the 1 x 1 matrix 
whose entry is (v;, v,). It is an invertible matrix, so the form is nondegenerate on W. By 
Theorem 8.4.5, V = W ® W-. By our induction assumption, W= has an orthogonal basis, 
say (U2,..., Un). Then (v1, v2,..., Un) will be an orthogonal basis of V. O 


Orthogonal Projection 


Suppose that our given form is nondegenerate on a subspace W. Theorem 8.4.5 tells us that 
V is the direct sum W © W". Every vector v in V can be written uniquely in the form 
v=w+4u, with w in W and u in W+. The orthogonal projection from V to W is the map 
mw: V — W defined by z(v) = w. The decomposition v = w + u is compatible with sums of 
vectors and with scalar multiplication, so 7 is a linear transformation. 

The orthogonal projection is the unique linear transformation from V to W such that 
m(w) = wif w isin W and z(u) =0 if visin Wt. 


Note: If the form is degenerate on a subspace W, the orthogonal projection to W doesn’t 
exist. The reason is that W 9 W+ will contain a nonzero element x, and it will be impossible 
to have both 2(x) = x and x(x) = 0. 0 


The next theorem provides a very important formula for orthogonal projection. 
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Theorem 8.4.11 Projection Formula. Let ( , ) be a symmetric form on a real vector space V 
or a Hermitian form on a complex vector space V, and let W be a subspace of V on which 
the form is nondegenerate. If (w ,..., wx) is an orthogonal basis for W, the orthogonal 
projection z : V > W is given by the formula z(v) = wic, + --- + wycex, where 

(Wi, V) 


cj, = ———-.. 
(wi, Wi) 


Proof. Because the form is nondegenerate on W and its matrix with respect to an orthogonal 
basis is diagonal. (w;, wij) #0. The formula makes sense. Given a vector v, let w denote the 
vector wW1C; +++: + wycxz, with c; as above. This is an element of W, so if we show that 
v—w =u isin W', it will follow that z(v) = w, as the theorem asserts. To show that w is 
in W-, we show that (w;, 4) = Ofori =1,...,k. We remember that (w,, w;) = Oif t# 7. 
Then 


(w;, u) = (wi, v) — (wi, W) = (wi, Vv) — ((w;, W1)C1 +--+ + (Wi, WE)CK) 


= (wij, v) — (w;, wie; = 0. 0. 


Warning: This projection formula is not correct unless the basis is orthogonal. 


Example 8.4.12 Let V be the space R? of column vectors, and let (v, w) denote the dot 
product form. Let W be the subspace spanned by the vector w, whose coordinate vector is 
(1, 1, 1)’. Let (1, x2, x3)! be the coordinate vector of a vector v. Then (w 1, v) = x1 +xX2+%3. 
The projection formula reads z(v) = w 4c, where c = (x1 + x2 + X3)/3. O 


If a form is nondegenerate on the whole space V, the orthogonal projection from V to 
V will be the identity map. The projection formula is interesting in this case too, because it 
can be used to compute the coordinates of a vector v with respect to an orthogonal basis. 


Corollary 8.4.13 Let (,) be a nondegenerate symmetric form on a real vector space V 
or a nondegenerate Hermitian form on a complex vector space V, let (v1,..., Un) be an 
orthogonal basis for V, and let v be any vector. Then v = vj c] +--+ + UnCn, where 


(vi, U) 
aca (vj, v;) 


: O 


Example 8.4.14 Let B = (14, v2, v3) be the orthogonal basis of R? whose coordinate vectors 


1 1 1 
1},}-1],] 1 
1 0 -2 


Let v be a vector with coordinate vector (x, x2, x3)’. Then v = vc] + v2C2 + v3¢3 and 


Cy = (41 +: X2 +-%3)/3, Co = (1 — X2)/2, €3 = (1 + X2 — 223)/6. OC 


Next, we consider scaling of the vectors that make up an orthogonal basis. 
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Corollary 8.4.15 Let ( , ) be a symmetric form on a real vector space V or a Hermitian form 
on a complex vector space V. 


(a) There is an orthogonal basis B = (11, ..., U,) for V with the property that for each 7, 
(v;, vz) is equal to 1, -1, or 0. 

(b) Matrix form: If A is a real symmetric n Xn matrix, there is an invertible real matrix P 
such that P'AP is a diagonal matrix, each of whose diagonal entries is 1, -1, or 0. If A 
is a complex Hermitian n Xn matrix, there is an invertible complex matrix P such that 
P*AP is a diagonal matrix, each of whose diagonal entries is 1, -1, or 0. 


Proof. (a) Let (v4,..., U,) be an orthogonal basis. If v is a vector, then for any nonzero 
real number c, (cv, cv) = c*(v, v), and c? can be any positive real number. So if we multiply 
v; by a scalar, we can adjust the real number (v;, vj) by an arbitrary positive real number. 
This proves (a). Part (b) follows in the usual way, by applying (a) to the form X*AY. a) 


If we arrange an orthogonal basis that has been scaled suitably, the matrix of the form 
will have a block decomposition 


(8.4.16) A=|  -Im 


where p, m, and z are the numbers of 1’s, -1’s, and 0’s onthe diagonal, and p+m+z=n. 
The form is nondegenerate if and only if z = 0. 

If the form is nondegenerate, the pair of integers (p, m) is called the signature of the 
form. Sylvester’s Law (see Exercise 4.21) asserts that the signature does not depend on the 
choice of the orthogonal basis. 

The notation Jp,» is often used to denote the diagonal matrix 


(8.4.17) je kK ; F 
~im 


With this notation, the matrix (8.2.2) that represents the Lorentz form is /3 1. 

The form is positive definite if and only if m and z are both zero. Then the normalized 
basis has the property that (v;, vj) = 1 for each i, and (v;, vj) = 0 when ? + j. This is called 
an orthonormal basis, in agreement with the terminology introduced before, for bases of R” 
(5.1.8). An orthonormal basis B refers the form back to dot product on R” or to the standard 
Hermitian form on C”. That is, if v = BX and w = BY, then (v, w) = X*Y. Anorthonormal 
basis exists if and only if the form is positive definite. 


Note: If B is an orthonormal basis for a subspace W of V, the projection from V to W is 
given by the formula z(v) = w ,cy+--+-wzcx, where cj = (w;, v). The projection formula is 
simpler because the denominators (w;, w;) in (8.4.11) are equal to 1. However, normalizing 
the vectors requires extracting a square root, and because of this, it is sometimes preferable 
to work with an orthogonal basis without normalizing. 0 


The proof of the remaining implication (iii) > (i) of Theorem 8.2.5 follows from this 
discussion: 
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Corollary 8.4.18 Ifa real matrix A is symmetric and positive definite, then the form X'AY 
represents dot product with respect to some basis of R”. 


When a positive definite symmetric or Hermitian form is given, the projection formula 
provides an inductive method, called the Gram-Schmidt procedure, to produce an orthonor- 
mal basis, starting with an arbitrary basis (11, ..., Un). The procedure is as follows: Let Vi 
denote the space spanned by the basis vectors (v1, ..., 0). Suppose that, for some k < n, 
we have found an orthonormal basis (wy, ..., wx_1) for V~_1. Let 2 denote the orthogonal 
projection from V to Vy_1. Then w(vg) = wie, +--+ + We_1CK-1, Where cj = (Wj, Ux), 
and wz = vx — (vq) is orthogonal to Vy_;. When we normalize (w;, wx) to 1, the set 
(wy, ..., Wk) will be an orthonormal basis for Vx. O 


The last topic of this section is a criterion for a symmetric form to be positive definite 
in terms of its matrix with respect to an arbitrary basis. Let A = (a;;) be the matrix of a 
symmetric form with respect to a basis B = (1,..., Un) of V, and let Ag, denote the kxk 
minor made up of the matrix entries a;; with i, j < k: 


a a 
A, =[an], A= | a ae ats. 4 geeks 


Theorem 8.4.19 The form and the matrix are positive definite if and only if det A, > 0 for 
k=1,...,n. 
We leave the proof as an exercise. O 


For example, the matrix A = la ; is positive definite, because det [2] and det A are 


both positive. 
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When we work in R”, we may wish to change the basis. But if our problem involves dot 
products — if length or orthogonality of vectors is involved - a change to an arbitrary 
new basis may be undesirable, because it will not preserve length and orthogonality. It 
is best to restrict oneself to orthonormal bases, so that dot products are preserved. The 
concept of a Euclidean space provides us with a framework in which to do this. A real 
vector space together with a positive definite symmetric form is called a Euclidean space, 
and a complex vector space together with a positive definite Hermitian form is called a 
Hermitian space. 

The space R”, with dot product, is the standard Euclidean space. An orthonormal 
basis for any Euclidean space will refer the space back to the standard Euclidean space. 
Similarly, the standard Hermitian form (X, Y) = X*Y makes C” into the standard Hermitian 
space, and an orthonormal basis for any Hermitian space will refer the form back to the 
standard Hermitian space. The only significant difference between an arbitrary Euclidean 
or Hermitian space and the standard Euclidean or Hermitian space is that no orthonormal 
basis is preferred. Nevertheless, when working in such spaces we always use orthonormal 
bases, though none have been picked out for us. A change of orthonormal bases will be 
given by a matrix that is orthogonal or unitary, according to the case. 
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Corollary 8.5.1 Let V be a Euclidean or a Hermitian space, with positive definite form 
(,), and let W be a subspace of V. The form is nondegenerate on W, and therefore 
V=wew!l. 


Proof. If w is a nonzero vector in W, then (w, w) is a positive real number. It is not zero, 
and therefore w is not a null vector in V or in W. The nullspaces are zero. 0 


What we have learned about symmetric forms allows us to interpret the length of a 
vector and the angle between two vectors v and w in a Euclidean space V. Let’s set aside the 
special case that these vectors are dependent, and assume that they span a two-dimensional 
subspace W. When we restrict the form, W becomes a Euclidean space of dimension 2. 
So W has an orthonormal basis (w1, w2), and via this basis, the vectors v and w will 
have coordinate vectors in R?. We’ll denote these two-dimensional coordinate vectors by 
lowercase letters x and y. They aren’t the coordinate vectors that we would obtain using an 
orthonormal basis for the whole space V, but we will have (v, w) = x'y, and this allows us 
to interpret geometric properties of the form in terms of dot product in R?. 

The length |v| of a vector v is defined by the formula |u|? = (v, v). If x is the coordinate 
vector of v in R2, then |u|? = xx. The law of cosines (x - y) = |x||y| cos @ in R* becomes 


(8.5.2) (v, w) = |v||w| cos 0, 


where @ is the angle between x and y. Since this formula expresses cos @ in terms of the form, 
it defines the unoriented angle 6 between vectors v and w. But the ambiguity of sign in the 
angle that arises because cos @ = cos (—@) can’t be eliminated. When one views a plane in 
R°? from its front and its back, the angles one sees differ by sign. 


8.6 THE SPECTRAL THEOREM 


In this section, we analyze certain linear operators on a Hermitian space. 

Let 7: V — V be a linear operator on a Hermitian space V, and let A be the matrix of 
T with respect to an orthonormal basis B. The adjoint operator T*: V > V is the operator 
whose matrix with respect to the same basis is the adjoint matrix A*. 

If we change to a new orthonormal basis B’, the basechange matrix P will be unitary, 
and the new matrix of T will have the form A’ = P*AP = P"!AP. Its adjoint will be 
A™ = P*A*P. This is the matrix of T* with respect to the new basis. So the definition of T* 
makes sense: It is independent of the orthonormal basis. 

The rules (8.3.5) for computing with adjoint matrices carry over to adjoint operators: 


(8.6.1) (FEU SPFP4r. Guyer. TF =T, 


A normal matrix is a complex matrix A that commutes with its adjoint: A*A = AA*. 
In itself, this isn’t a particularly important class of matrices, but is the natural class for which 
to state the Spectral Theorem that we prove in this section, and it includes two important 
classes: Hermitian matrices (A* = A) and unitary matrices (A* = A“). 


Lemma 8.6.2 Let A be a complex n Xn matrix and let P be ann Xn unitary matrix. If A is 
normal, Hermitian, or unitary, so is P* AP. oO 


A linear operator T on a Hermitian space is called normal, Hermitian, or unitary 
if its matrix with respect to an orthonormal basis has the same property. So T is normal 


Section 8.6 TheSpectral Theorem 243 


if 7*T = TT*, Hermitian if 7* = T, and unitary if 7*7 = 7. A Hermitian operator is 
sometimes called a self-adjoint operator, but we won’t use that terminology. 
The next proposition interprets these conditions in terms of the form. 


Proposition 8.6.3 Let T be a linear operator on a Hermitian space V, and let 7* be the 
adjoint operator. 


(a) For all v and w in V, (Tv, w) = (v, T*w) and (v, Tw) = (T*v, w) 


(b) TJ is normal if and only if, for all v and win V, (Tv, Tw) = (T*v, T*w) 
(c) J is Hermitian if and only if, for all v and win V, (Tv, w) = (v, Tw). 
(d) T is unitary if and only if, for all v and w in V, (Tv, Tw) = (v, w). 


Proof. (a) Let A be the matrix of the operator 7 with respect to an orthonormal basis B. 
With v = BX and w = BY as usual, (Tv, w) = (AX)*Y = X*A*Y and (v, T*w) = X*A*Y. 
Therefore (Jv, w) = (v, T*w). The proof of the other formula of (a) is similar. 


(b) We substitute 7*v for v into the first equation of (a): (77 *v, w) = (T*v, T*w). Similarly, 
substituting Tv for v into the second equation of (a): (Jv, Tw) = (T*Tv, w). So if T is 
normal, then (Tv, Tw) = (T*v, T*w). The converse follows by applying Proposition 8.4.3 
to the two vectors 7*Tv and T7* v. The proofs of (c) and (d) are similar. 0 


Let T be a linear operator on a Hermitian space V. As before, a subspace W of V is 
T-invariant if TW C W. A linear operator T will restrict to a linear operator on a T-invariant 
subspace, and if T is normal, Hermitian, or unitary, the restricted operator will have the 
same property. This follows from Proposition 8.6.3. 


Proposition 8.6.4 Let T be a linear operator on a Hermitian space V and let W be a subspace 
of V. If W is T-invariant, then the orthogonal space W+ is T*-invariant. If W is 7*-invariant 
then W+ is T-invariant. 


Proof. Suppose that W is 7-invariant. To show that W+ is 7*-invariant, we must show that 
if u isin W+, then T*u is also in W+, which by definition of W+ means that (w, T*u) = 0 
for all win W. By Proposition 8.6.3, (w, T*u) = (Tw, u). Since W is T-invariant, Tw is in 
W. Then since u is in W+, (Tw, u) = 0. So (w, T*u) = 0, as required. Since 7** = T, one 
obtains the second assertion by interchanging the roles of T and 7*. QO 


The next theorem is the main place that we use the hypothesis that the form given on 
V be positive definite. 


Theorem 8.6.5 Let T be a normal operator on a Hermitian space V, and let v be an 
eigenvector of T with eigenvalue A. Then v is also an eigenvector of 7*, with eigenvalue i. 


Proof. Case 1: = 0. Then Tv = 0, and we must show that 7*v = 0. Since the form is 
positive definite, it suffices to show that (T*v, T*v) = 0. By Proposition 8.6.3, (T*v, T*v) = 
(Tv, Tv) = (0,0) =0. 

Case 2: 2 is arbitrary. Let S denote the linear operator T — AJ. Then v is an eigenvector for 
S with eigenvalue zero: Sv = 0. Moreover, S* = T* — AJ. You can check that S is a normal 
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operator. By Case 1, v is an eigenvector for S* with eigenvalue 0: S*v = T*v — Av = 0. This 
shows that v is an eigenvector of 7* with eigenvalue A. Oo 


Theorem 8.6.6 Spectral Theorem for Normal Operators 
(a) Let T be anormal operator on a Hermitian space V. There is an orthonormal basis of 
V consisting of eigenvectors for T. 


(b) Matrix form: Let A be anormal matrix. There is a unitary matrix P such that P*A P is 
diagonal. 


Proof. (a) We choose an eigenvector v; for T, and normalize its length to 1. Theorem 8.6.5 
tells us that v is also an eigenvector for 7*. Therefore the one-dimensional subspace W 
spanned by v is 7*-invariant. By Proposition 8.6.4, W+ is T-invariant. We also know that 
V = W® W!. The restriction of T to any invariant subspace, including W+, is a normal 
operator. By induction on dimension, we may assume that W~ has an orthonormal basis of 
eigenvectors, say (v2,..., Un). Adding v; to this set yields an orthonormal basis of V of 
eigenvectors for 7. 


(b) This is proved from (a) in the usual way. We regard A as the matrix of the normal 
operator of multiplication by A on C”. By (a) there is an orthonormal basis B consisting of 
eigenvectors. The matrix P of change of basis from E to B is unitary, and the matrix of the 
operator with respect to the new basis, whichis P*AP, is diagonal. oO 


The next corollaries are obtained by applying the Spectral Theorem to the two most 
important types of normal matrices. 
Corollary 8.6.7 Spectral Theorem for Hermitian Operators. 
(a) Let J be a Hermitian operator on a Hermitian space V. 
(i) There is an orthonormal basis of V consisting of eigenvectors of T. 
(ii) The eigenvalues of T are real numbers. 
(b) Matrix form: Let A be a Hermitian matrix. 


(i) There is a unitary matrix P such that P*A P is a real diagonal matrix. 
(ii) The eigenvalues of A are real numbers. 


Proof. Part (b)(ii) has been proved before (Theorem 8.3.11) and (a)(i) follows from the 
Spectral Theorem for normal operators. The other assertions are variants. Oo 


Corollary 8.6.8 Spectral Theorem for Unitary Matrices. 


(a) Let A be a unitary matrix. There is a unitary matrix P such that P*AP is diagonal. 
(b) Every conjugacy class in the unitary group U, contains a diagonal matrix. O 


To diagonalize a Hermitian matrix M, one can proceed by determining its eigen- 
vectors. If the eigenvalues are distinct, the corresponding eigenvectors will be orthogonal, 
and one can normalize their lengths to 1. This follows from the Spectral Theorem. For 


Section 8.7 Conics and Quadrics 245 


example, vy/ = u and v5 = 1 are eigenvectors of the Hermitian matrix M = 7 : ; 
-i 2 i -i 2 


with eigenvalues 3 and 1, respectively. We normalize their lengths to 1 by the factor 1//2, 


obtaining the unitary matrix P = a 3 it Then P*MP = |? 1 
However, the Spectral Theorem asserts that a Hermitian matrix can be diagonalized even 
when its eigenvalues aren’t distinct. For instance, the only 2 x 2 Hermitian matrix whose 


characteristic polynomial has a double root A is AI. 


What we have proved for Hermitian matrices has analogues for real symmetric 
matrices. A symmetric operator T on a Euclidean space V is a linear operator whose matrix 
with respect to an orthonormal basis is symmetric. Similarly, an orthogonal operator T ona 
Euclidean space V is a linear operator whose matrix with respect to an orthonormal basis is 
orthogonal. 


Proposition 8.6.9 Let T be a linear operator on a Euclidean space V. 
(a) 7 is symmetric if andonly if, for all v and w in V, (Tv, w) = (v, Tw). 
(b) T is orthogonal if and only if, for all v and w in V, (Jv, Tw) = (v, w). oO 


Theorem 8.6.10 Spectral Theorem for Symmetric Operators. 


(a) Let T be asymmetric operator on a Euclidean space V. 


(i) There is an orthonormal basis of V consisting of eigenvectors of T. 
(ii) The eigenvalues of T are real numbers. 


(b) Matrix form: Let A be a real symmetric matrix. 
(i) There is an orthogonal matrix P such that P'A P is a real diagonal matrix. 


(ii) The eigenvalues of A are real numbers. 


Proof. We have noted (b)(ii) before (Corollary 8.3.12), and (a)(ii) follows. Knowing this, 
the proof of (a)(i) follows the pattern of the proof of Theorem 8.6.6. O 


The Spectral Theorem is a powerful tool. When faced with a Hermitian operator or a 
Hermitian matrix, it should be an automatic response to apply that theorem. 


8.7, CONICS AND QUADRICS 


Ellipses, hyperbolas, and parabolas are called conics. They are loci in R* defined by quadratic 
equations f = 0, where 


(8.7.1) f(x, x2) = ax; + 2a42x4x2+ aynxs + b,x; + box24+0¢, 


and the coefficients a;;, b;, and c are real numbers. (The reason that the coefficient of x1x2 
is written as 2a; will be explained presently.) If the locus f = 0 of a quadratic equation is 
not a conic, we call it a degenerate conic. A degenerate conic can be a pair of lines, a single 
line, a point, or empty, depending on the equation. To emphasize that a particular locus is 
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not degenerate, we may sometimes refer to it as a nondegenerate conic. The term quadric is 
used to designate an analogous locus in three or more dimensions. 

We propose to describe the orbits of the conics under the action of the group of 
isometries of the plane. Two nondegenerate conics are in the same orbit if and only if they 
are congruent geometric figures. 

The quadratic part of the polynomial f(%1, x2) is called a quadratic form: 


(8.7.2) q(%1, X2) = ax; + 2042x14x2 + ayox>. 

A quadratic form in any number of variables is a polynomial, each of whose terms has 
degree 2 in the variables. It is convenient to express the quadratic form qg in matrix notation. 
To do this, we introduce the symmetric matrix 


(8.7.3) A= a1, a2 
te a2 an 


Then if X = (x1, x2)', the quadratic form can be written as g(x1, x2) = X'AX. We put 
the coefficient 2 into Formulas 8.7.1 and 8.7.2 in order to avoid some coefficients 5 in this 
matrix. If we also introduce the 1 x 2 matrix B = [b; bz], the equation f = 0 can be written 


compactly in matrix notation as 
(8.7.4) X'AX+ BX+c=0. 


Theorem 8.7.5 Every nondegenerate conic is congruent to one of the following loci, where 
the coefficients aj, and ay) are positive: 


Ellipse: ay,x? + Ay) x3 -1 =0, 
Hyperbola: ayX} - ayx3 -1 =0, 
Parabola: ay x; —x. =0. 


The coefficients a,; and az2 are determined by the congruence class of the conic, except that 
they can be interchanged in the equation of an ellipse. 


Proof. We simplify the equation (8.7.4) in two steps, first applying an orthogonal transfor- 
mation to diagonalize the matrix A and then applying a translation to eliminate the linear 
terms and the constant term when possible. 


The Spectral Theorem for symmetric operators (8.6.10) asserts that there is a 2X2 
orthogonal matrix P such that P'AP is diagonal. We make the change of variable PX’ = X, 
and substitute into (8.7.4): 


(8.7.6) X"A'X' + BX’ +c=0 


where A’ = P'AP and B’ = BP. With this orthogonal change of variable, the quadratic form 
becomes diagonal, that is, the coefficient of x}x, is zero. We drop the primes. When the 
quadratic form is diagonal, f has the form 


f (x1, x2) = aux} + an2x3 + byx, + box +c. 
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To continue, we eliminate b; by “‘completing squares,’ with the substitutions 


hy: 
cee fafa Se 
(8.7.7) xi = (x s). 


This substitution corresponds to a translation of coordinates. Dropping primes again, f 
becomes 


(8.7.8) I(%1, X2) = ayixt + an2x3 +c=0, 


where the constant term c has changed. The new constant can be computed when needed. 
When it is zero, the locus is degenerate. Assuming that c# 0, we can multiply f by a scalar 
to change c to -1. If a;; are both negative, the locus is empty, hence degenerate. So at least 
one of the coefficients is. positive, and we may assume that a1; > 0. Then we are left with the 
equations of the ellipses and the hyperbolas in the statement of the theorem. 


The parabola arises because the substitution made to eliminate the linear coefficient 
b; requires a;j to be nonzero. Since the equation f is supposed to be quadratic, these 
coefficients aren’t both zero, and we may assume a) # 0. If a22 = 0 but b2 £0, we eliminate 
b, and use the substitution , 


(8.7.9) x2= x4 ma c/b2 


to eliminate the constant term. Adjusting f by a scalar factor and eliminating degenerate 
cases leaves us with the equation of the parabola. 


Example 8.7.10 Let f be the quadratic polynomial x + 2x1x2- x3 + 2x; + 2x2 —1. Then 


a=|t a B=[2 2], and c=-1. 


The eigenvalues of A are + V2. Setting a = /2 — 1 and b = V2 +1, the vectors 


o-() fd 


are eigenvectors with eigenvalues 2 and -V/2, respectively. They are orthogonal, and when 
we normalize their lengths to 1, they will form an orthonormal basis B such that [B]"! A[B] 
is diagonal. Unfortunately, the square length of v; is 4 — 2/2. To normalize its length to 1, 
we must divide by V4 — 2/2. It is unpleasant to continue this computation by hand. 

If a quadratic equation f(x), x2) = Ois given, we can determine the type of conic that 
it represents most simply by allowing arbitrary changes of basis, not necessarily orthogonal 
ones. A nonorthogonal change will distort an ellipse but it will not change an ellipse into a 
hyperbola, a parabola, or a degenerate conic. If we wish only to identify the type of conic, 
arbitrary changes of basis are permissible. 

We proceed as in (8.7.6), but with a nonorthogonal change of basis: 


ref! a} emo TE AIC UeEE a): are 


248 Chapter 8 Bilinear Forms 


Dropping primes, the new equation becomes x - 2x8 + 2x, —1=0, and completing the 
square yields a — 2x2 — 2 =0, a hyperbola. So the original locus is a hyperbola too. 

By the way, the matrix A is positive or negative definite in the equation of an ellipse 
and indefinite in the equation of a hyperbola. The matrix A shown above is indefinite. We 
could have seen right away that the locus we have just inspected was either a hyperbola or a 
degenerate conic. O 


The method used to describe conics can be applied to classify quadrics in any dimension. 
The general quadratic equation has the form f = 0, where 


(8.7.11) Fei, 6.65 Xn) = Yo aiix? + Y° 2a;j5xj + Do dixi tc. 
i 


i<j i 


Let matrices A and B be defined by 


ay} ain 
A= Nk, RUD, nde Oe 
Then 
(8.7.12) fr, ...,.Xn) = X'AX + BX +. 


The associated quadratic form is 
(8.7.13) Gee th) SAK. 


According tothe Spectral Theorem for symmetric operators, the matrix A can be diagonalized 
by an orthogonal transformation P. When A is diagonal, the linear terms and the constant 
term may be eliminated, so far as possible, as above. Here is the classification in three 
variables: 


Theorem 8.7.14 The congruence classes of nondegenerate quadrics in R? are represented 
by the following loci, in which aj; are positive real numbers: 

Ellipsoids: ayixt + ayx5 + a33x3-1 =0, 

One-sheeted hyperboloids: ay xt + ayrx3 - A334 -1 =0, 


Two-sheeted hyperboloids: a,x} — ayx5 — a33x3-1 =0, 


Elliptic paraboloids : ayxt + ay x5 -x3 =0, 
Hyperbolic paraboloids: ayxt = anrx5 —x3 =0. : QD 


A word is in order about the case that B and c are zero in the quadratic polynomial 
F(X1, x2, x3) (8.7.12), ie, that f is equal to its quadratic form g (8.7.13). The locus {gq = 0} 
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is considered degenerate, but is interesting. Let’s call it Q. Since all of the terms a;jxj;x; that 
appear in q have degree 2, 


(8.7.15) q(Ax,Ax2, x3) = A* G(x], x2, x3). 


for any real number A. Consequently, if a point X #0 lies on Q, i-e., if g(X) = 0, then 
q(AX) = 0 too, so AX lies on Q for every real number A. Therefore Q is a union of lines 
through the origin, a double cone. 

For example, suppose that q is the diagonal quadratic form 


ax} cr anx5 - cee 
where aj are positive. When we intersect the locus Q with the plane x3 = 1, we obtain an 
ellipse ayx} + aynXx3 = 1 in the remaining variables. In this case Q is the union of lines 
through the origin and the points of this ellipse. 


(8.7.16) Hyperboloids Near to a Cone. 


Notice that g(x) is positive in the exterior of the double cone, and negative in its interior. 
(The value of g(x) changes sign only when one crosses Q.) So for any 7 > 0, the locus 
ayxt + a22X5 — x3 —r = 0 lies in the exterior of the double cone. It is a one-sheeted 
hyperboloid, while the locus aix4 + anx3 — x +r = 0 lies in the interior, and is a 
two-sheeted hyperboloid. 

Similar reasoning can be applied to any homogeneous polynomial g(x1,..., Xn), any 
polynomial in which all of the terms have the same degree d. If g is homogeneous of degree 
d, g(Ax) = 44 9(x), and because of this, the locus {g = 0} will also be a union of lines 
through the origin. 


8.8 SKEW-SYMMETRIC FORMS 


The description of skew-symmetric bilinear forms is the same for any field of scalars, so in 
this section we allow vector spaces over an arbitrary field F. However, as usual, it may be 
best to think of real vector spaces when going through this for the first time. 
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A bilinear form ( , ) on a vector space V is skew-symmetric if it has either one of the 
following equivalent properties: 


(8.8.1) (v, v) =0 for all vin V, or 


(8.8.2) (u,v) =-(v,u) foralluandvinV. . 


To be more precise, these conditions are equivalent whenever the field of scalars has 
characteristic different from 2. If F has characteristic 2, the first condition (8.8.1) is the 
correct one. The fact that (8.8.1) implies (8.8.2) is proved by expanding (u + v, u + v): 


(u+v,u+v) = (u,u)+ (u, v) + (v,u) + (v, Vv), 


and using the fact that (u,u) = (v, v) = (’+v,u+t+v) = 0. Conversely, if the second 
condition holds, then setting u = v gives us (v, v) = -(v, v), hence 2(v, v) = 0, and it follows 
that (v, v) = 0, unless 2 = 0. 

A bilinear form (, ) is skew-symmetric if and only if its matrix A with respect to an 


arbitrary basis is a skew-symmetric matrix, meaning that aj = -aj;j and a;; = 0, for all i and 
Jj. Except in characteristic 2, the condition a;; = 0 follows from aj; = -a;; when one sets 
i=j. 


The determinant form (X, Y) on R?, the form defined by 


x1 


(8.8.3) (X, Y) = det E 


Yij_y -—x 
a 12 2¥1; 


is a Simple example of a skew-symmetric form. Linearity and skew symmetry in the columns 
are familiar properties of the determinant. The matrix of the determinant form (8.8.3) with 
respect to the standard basis of R? is 


(8.8.4) p= E “L, 


We will see in Theorem 8.8.7 below that every nondegenerate skew-symmetric form looks 
very much like this one. 


Skew-symmetric forms also come up when one counts intersections of paths on a 
surface. To obtain a count that doesn’t change when the paths are deformed, one can adopt 
the rule used for traffic flow: A vehicle that enters an intersection from the right has the 
right of way. If two paths X and Y on the surface intersect at a point p, we define the 
intersection number (X, Y)p at p as follows: If X enters the intersection to the right of 
Y, then (X, Y)p = 1, and if X enters to the left of Y, then (X, Y)p, = -1. Then in either 
case, (X, Y) » = -(Y, X) p. The total intersection number (X, Y) is obtained by adding these 
contributions for all intersection points. In this way the contributions arising when X crosses 
Y and then turns back to cross again cancel. This is how topologists define a product in 
“homology.” 
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(8.8.5) Oriented Intersections (X, Y). 


Many of the definitions given in Section 8.4 can be used also with skew-symmetric 
forms. In particular, two vectors v and w are orthogonal if (v, w) = 0. It is true once more 
that vw if and only if wv, but there is a difference: When the form is skew-symmetric, 
every vector v is self-orthogonal: vv. And since all vectors are self-orthogonal, there can 
be no orthogonal bases. 

As is true for symmetric forms, a skew-symmetric form is nondegenerate if and only if 
its matrix with respect to an arbitrary basis is nonsingular. The proof of the next theorem is 
the same as for Theorem 8.4.5. 


Theorem 8.8.6 Let ( , ) be a skew-symmetric form on a vector space V, and let W be a 
subspace of V on which the form is nondegenerate. Then V is the orthogonal sum W ® W“. 
If the form is nondegenerate on V and on W, it is nondegenerate on W* too. QD 


Theorem 8.8.7 


(a) Let V be a vector space of positive dimension m over a field F, and let (,) be a 
nondegenerate skew-symmetric form on V. The dimension of V is even, and V has a 
basis B such that the matrix So of the form with respect to that basis is made up of 
diagonal blocks, where all blocks are equal to the 2 X 2 matrix S shown above (8.8.4): 


x 
So = 
x 


(b) Matrix form: Let A be an invertible skew-symmetric m Xm matrix. There is an invertible 
matrix P such that P‘AP = Sq is as above. 


Proof. (a) Since the form is nondegenerate, we may choose nonzero vectors vy and v2 such 
that (vj, v2) = c is not zero. We adjust v2 by a scalar factor to make c = 1. Since (11, v2) 40 
but (v1, v;) = 0, these vectors are independent. Let W be the two-dimensional subspace with 
basis (v1, U2). The matrix of the form restricted to W is \. Since this matrix is invertible, the 
form is nondegenerate on W, so V is the direct sum W © W", and the form is nondegenerate 
on W+, By induction, we may assume that there is a basis (v3,..., Un) for W+ such that 
the matrix of the form on this subspace has the form (8.8.7). Then (v1, v2, v3, ..., Uy) is the 
required basis for V. 0 
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Corollary 8.8.8 If A is an invertible m Xm skew-symmetric matrix, then m is an even integer. 


O 
Let (, ) be a nondegenerate skew-symmetric form on a vector space of dimension 2n. 
We rearrange the basis referred to in Theorem 8.8.7 as (v1, V3, ..., U2n_13 U2, U4, °*- Van). 
The matrix will be changed into a block matrix made up of n Xn blocks 
Oa | 
(8s) s=[2 J, 


8.9 SUMMARY 


We collect some of the terms that we have used together here. They are used for a symmetric 
or a skew-symmetric form on a real vector space and also for a Hermitian form on a complex 
vector space. 


orthogonal vectors: Two vectors v and w are orthogonal (written vLw) if (v, w) = 0. 


orthogonal space to a subspace: The orthogonal space W+ to a subspace W of V is the set 
of vectors v that are orthogonal to every vector in W: 


Wi ={veV|(v,W) =o]. 


null vector: A null vector is a vector that is orthogonal to every vector in V. 


nullspace: The nullspace N of the given form is the set of null vectors: 
N ={v|(v, V) =O}. 


nondegenerate form: The form is nondegenerate if its nullspace is the zero space {0}. This 
means that for every nonzero vector v, there is a vector v’ such that (v, v’) #0. 


nondegeneracy on a subspace: The form is nondegenerate on a subspace W if its restriction 
to W is a nondegenerate form, or if WM W+ = {0}. If the form is nondegenerate on a 
subspace W, then V = W@ WL. 


orthogonal basis: A basis B = (v1,..., Un) of V is orthogonal if the vectors are mutually 
orthogonal, that is, if (vj, vj) = 0 for all indices ¢ and j with 1+ j. The matrix of the form 
with respect to an orthogonal basis is a diagonal matrix. Orthogonal bases exist for any 
symmetric or Hermitian form, but not for a skew-symmetric form. 


orthonormal basis: A basis B = (v1, ..., Un) is orthonormal if (v;, vj) = 0 for i+ 7 and 
(vj, vi) = 1. An orthonormal basis for a symmetric or Hermitian form exists if and only if 
the form is positive definite. 


orthogonal projection: If a symmetric or Hermitian form is nondegenerate on a subspace 
W, the orthogonal projection to W is the unique linear transformation 7: V — W such that: 
w(v) = vif vis in W, and z(v) = 0 if v is in the orthogonal space W+. 
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If the form is nondegenerate on a subspace W and if (wj,..., wx) is an orthogonal 
basis for W, the orthogonal projection is given by the formula z(v) = wc, +--+ WxCx, 
where 

__ (Wi, v) 
vo (wi, wi)’ 
Spectral Theorem: 
¢ If A is normal, there is a unitary matrix P such that P*AP is diagonal. 
1 
¢ If A is Hermitian, there is a unitary matrix P such that P*AP is a real diagonal matrix. 
¢ Inthe unitary group U,, every matrix is conjugate to a diagonal matrix. 
e If Ais areal symmetric matrix, there is an orthogonal matrix P such that P’AP is diagonal. 


The table below compares various concepts used for real and for complex vector 
spaces. 


Real Vector Spaces Complex Vector Spaces 
forms 
symmetric Hermitian 
(v, w) = (w, v) (v, W) = (W, v) 
matrices 
symmetric Hermitian 
A‘=A AX =A 
orthogonal unitary 
A'A=I AXA=I 
normal 
A*A = AA* 
operators 
symmetric Hermitian 
(Tv, w) = (v, Tw) (Tv, w) = (v, Tw) 
orthogonal unitary 
(uv, w) = (Tv, Tw) (v, w) = (Tv, Tw) 
normal 
(Tv, Tw) = (T*v, T*w) 
arbitrary 


(v, Tw) = (T*v, w) 


In helping geometry, modern algebra is helping itself above all. 


—Oscar Zariski 
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EXERCISES 


Section 1 Real Bilinear Forms 


1.1, Show that a bilinear form ( , ) ona real vector space V is a sum of a symmetric form and 
a skew-symmetric form. 


Section 2 Symmetric Forms 
2.1. Prove that the maximal entries of a positive definite, symmetric, real matrix are on the 
diagonal. 


2.2. Let A and A’ be symmetric matrices related by A’ = P'AP, where P is invertible. Is it 
true that the ranks of A and of A’ are equal? 


Section 3 Hermitian Forms 
3.1. Is acomplex m Xn matrix A such that X*AX is real for all X Hermitian? 


3.2, Let (, ) be a positive definite Hermitian form on a complex vector space V, and let { , } 
and [ , ] be its real and imaginary parts, the real-valued forms defined by 


(v, w) = {v, w} + [v, wi. 
Prove that when V is made into a real vector space by restricting scalars to R, { ,} is a 
positive definite symmetric form, and [, ] is a skew-symmetric form. 
3.3. The set of nm Xn Hermitian matrices forms a real vector space. Find a basis for this space. 
3.4, Prove that if A is an invertible matrix, then A*A is Hermitian and positive definite. 


3.5. Let A and B be positive definite Hermitian matrices. Decide which of the following 
matrices are necessarily positive definite Hermitian: A’, A7!, AB, A+B. 


3.6. Use the characteristic polynomial to prove that the eigenvalues of a 2 x2 Hermitian 
matrix A are real. 


Section 4 Orthogonality 
4.1. What is the inverse of a matrix whose columns are orthogonal? 


4.2. Let (, ) be a bilinear form on a real vector space V, and let v be a vector such that 
(v, v) #0. What is the formula for orthogonal projection to the space W = vt orthogonal 
to v? 


4.3, Let A be a real m Xn matrix. Prove that B = A‘A is positive semidefinite, i.c., that 
X'BX > O for all X, and that A and B have the same rank. 


4.4, Make a sketch showing the positions of some orthogonal vectors in R*, when the form is 
(X, Y) = X1y1 — x22. 
4.5, Find an orthogonal basis for the form on R” whose matrix is 


101 
11 

«| } @fo 21 
i 111 


4.6. Extend the vector X; = 3a, ~1, 1, 1)' to an orthonormal basis for R*. 
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4.7. Apply the Gram-Schmidt procedure to the basis (1, 1, 0)', (1, 0, 1)', (0, 1, 1)' of R?. 
48. Let A = i a Find an orthonormal basis for R? with respect to the form XtAY. 


4.9. Find an orthonormal basis for the vector space P of all real polynomials of degree at most 
2, with the symmetric form defined by 


ak 
(fg) = [ _fe)g(xde. 


4.10. Let V denote the vector space of real n Xn matrices. Prove that (A, B) = trace(A'B) 
defines a positive definite bilinear form on V, and find an orthonormal basis for this form. 
4.11. Let W,, W2 be subspaces of a vector space V with a symmetric bilinear form. Prove 
(a) (W, + W,)+ =Wi Wy, (b)WCW!4+, (c) If WC Wo, then Wi > Wi. 


4.12. Let V = R®~ be the vector space of real 2 x2 matrices. 


(a) Determine the matrix of the bilinear form (A, B) = trace(AB) on V with respect to 
the standard basis (e;;}. 

(b) Determine the signature of this form. 

(c) Find an orthogonal basis for this form. 

(d) Determine the signature of the form trace AB on the space R”" of real n Xn matrices. 


*4,13. (a) Decide whether or not the rule (A, B) = trace(A*B) defines a Hermitian form on 
the space C”” of'complex matrices, and if so, determine its signature. 
(b) Answer the same question for the form defined by (A, B) = trace(AB). 

4.14. The matrix form of Theorem 8.4.10 asserts that if A is a real symmetric matrix, there 
exists an invertible matrix P such that P'AP is diagonal. Prove this by row and column 
operations. 

4.15. Let W be the subspace of R3 spanned by the vectors (1, 1, 0)! and (0, 1, 1)’. Determine 
the orthogonal projection of the vector (1, 0, 0)‘ to W. 

4.16. Let V be the real vector space of 3X3 matrices with the bilinear form (A, B) = trace A’B, 


and let W be the subspace of skew-symmetric matrices. Compute the orthogonal projec- 
tion to W with respect to this form, of the matrix 


12 0 
00 1]. 
13 0 


4.17, Use the method of (3.5.13) to compute the coordinate vector of the vector (x1, x2, x3)! 
with respect to the basis B described in Example 8.4.14, and compare your answer with 
the projection formula. 


4.18. Find the matrix of a projection 7:R? > R? such that the image of the standard bases of 
R’ forms an equilateral triangle and (e,) points in the direction of the x-axis. 


4.19, Let W be a two-dimensional subspace of R?, and consider the orthogonal projection 7 of 
R3 onto W. Let (a;, b;)' be the coordinate vector of m(e;), with respect to a chosen or- 
thonormal basis of W. Prove that (a;, a2, a3) and (b;, b2, b3) are orthogonal unit vectors. 
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4.20. Prove the criterion for positive definiteness given in Theorem 8.4.19. Does the criterion 
carry over to Hermitian matrices? 


4.21. Prove Sylvester’s Law (see 8.4.17). 


Hint: Begin by showing that if W, and W2 are subspaces of V andif the form is positive 
definite on W, and negative semi-definite on W2, then W; and W? are independent. 


Section 5 Euclidean Spaces and Hermitian Spaces 
§.1. Let V be a Euclidean space. 
(a) Prove the Schwarz inequality |(v, w)| < |\v||w]. 
(b) Prove the parallelogram law |v + w|? + |v — wl? =2\v|* +2|w/?. 
(c) Prove that if |v| = |w|, then (v + w)L(v — w). 
5.2. Let W be a subspace of a Euclidean space V. Prove that W = W!. 


*5,3, Let w ¢ R” be a vector of length 1, and let U denote the orthogonal space wt. The 
reflection ry about U is defined as follows: We write a vector v in the form v = cw +u, 
where u € U. Thenry(v) =-cw + u. 

(a) Prove that the matrix P = J — 2ww’ is orthogonal. 
(b) Prove that multiplication by P is a reflection about the orthogonal space U. 
(c) Let u, v be vectors of equal length in R”. Determine a vector w such that Pu = v. 


5.4. Let T be a linear operator on V = R” whose matrix A is a real symmetric matrix. 


(a) Prove that V is the orthogonal sum V = (ker 7) ® (im7). 
(b) Prove that T is an orthogonal projection onto im T if and only if, in addition to being 
symmetric, A? = A. 


§.5. Let P be a unitary matrix, and let X; and X>2 be eigenvectors for P, with distinct 
eigenvalues A; and Az. Prove that X} and X2 are orthogonal with respect to the standard 
Hermitian form on C”. 


5.6. What complex numbers might occur as eigenvalues of a unitary matrix? 


Section 6 The Spectral Theorem 


6.1. Prove Proposition 8.6.3(c), (d). 


6.2. Let T be a symmetric operator on a Euclidean space. Using Proposition 8.6.9, prove that 
if v is a vector and if Tv = 0, then Tv = 0. 


6.3. What does the Spectral Theorem tell us about a real 3 x3 matrix that is both symmetric 
and orthogonal? 


6.4. What can be said about a matrix A such that A*A is diagonal? 


6.5. Prove that if A is a real skew-symmetric matrix, then iA is a Hermitian matrix. What 
does the Spectral Theorem tell us about a real skew-symmetric matrix? 


6.6. Prove that an invertible matrix A is normal if and only if A*A™! is unitary. 


6.7. Let P be a real matrix that is normal and has real eigenvalues. Prove that P is 
symmetric. 


6.8. 


6.9. 


6.10. 


6.14. 


6.15. 


*6.16. 


*6.17. 


6.18. 


6.19. 
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Let V be the space of differentiable complex-valued functions on the unit circle in the 
complex plane, and for f, g € V, define 


2m 
(2)= [ F()g(0)d0. 


(a) Show that this form is Hermitian and positive definite. 

(b) Let W be the subspace of V of functions f(e'”), where f is a polynomial of degree 
<n. Find an orthonormal basis for W. 

(c) Show that T = i4 is a Hermitian operator on V, and determine its eigenvalues 
on W. 


Determine the signature of the form on R? whose matrix is F ah and determine an 
orthogonal matrix P such that P‘ AP is diagonal. 


Prove that if T is a Hermitian operator on aHermitianspace V, therule {v, w} = (v, Tw) 
defines a second Hermitian form on V. 


. Prove that eigenvectors associated to distinct eigenvalues of a Hermitian matrix A are © 


orthogonal. 


. Find a unitary matrix P so that P*AP is diagonal, when A = E Hl 


. 5. Find a real orthogonal matrix P so that P'AP is diagonal, when A is the 


matrix 


2 1 


11 2 111 101 
@)| ]) @}1 11}, @fo ro 


111 1 0 0 
Prove that a real symmetric matrix A is positive definite if and only if its eigenvalues are 
positive. 


Prove that for any square matrix A, kerA = (imA*)+, and that if A is normal, 
kerA = (imA)+. 


Let ¢ = e?%'/", and let A be the n Xn matrix whose entries are ajx = ¢/*/,/n. Prove that 
A is unitary. 


Let A, B be Hermitian matrices that commute. Prove that there is a unitary matrix P such 
that P*AP and P* BP are both diagonal. 


Use the Spectral Theorem to prove that a positive definite real symmetric m Xn matrix A 
has the form A = P'P for some P. 


Prove that the cyclic shift operator 


0 1 
01 


1 0 


is unitary, and determine its diagonalization. 
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6.20. Prove that the circulant, the matrix below, is normal. 


co Cy Cn 
Cn CO Cn-1 
cy 2 Co 


6.21. What conditions on the eigenvalues of a normal matrix A imply that A is Hermitian? 
That A is unitary? 


6.22. Prove the Spectral Theorem for symmetric operators. 


Section 7 Conics and Quadrics 


7.1. Determine the type of the quadric x? + 4xy + 2xz +27 +3x+z-6=0. 


7.2. Suppose that the quadratic equation (8.7.1) represents an ellipse. Instead of diagonalizing 
the form and then making a translation to reduce to the standard type, we could make 
the translation first. How can one determine the required translation? 


7.3. Give a necessary and sufficient condition, in terms of the coefficients of its equation, for 
a conic to be a circle. 


7.4, Describe the degenerate quadrics geometrically. 


Section 8 Skew-Symmetric Forms 
8.1. Let A be an invertible, real, skew-symmetric matrix. Prove that A? is symmetric and 
negative definite. 


8.2. Let W be a subspace on which a real skew-symmetric form is nondegenerate. Find a 
formula for the orthogonal projection 7:V > W. 


8.3. Let Sbe areal skew-symmetric matrix. Prove that J+Sisinvertible,and that (I—S)(1+S)7! 
is orthogonal. 


*8.4, Let A be areal skew-symmetric matrix. 


(a) Prove that det A > 0. 
(b) Prove that if A has integer entries, then det A is the square of an integer. 


Miscellaneous Problems 


M.1. According to Sylvester’s Law, every 2X2 real symmetric matrix is congruent to exactly one 
of six standard types. List them. If we consider the operation of G Lz on 2 X 2 matrices by 
Px A = PAP', then Sylvester’s Law asserts that thé symmetric matrices form six orbits. 
We may view the symmetric matrices as points in R3, letting (x, y, z) correspond to the 


matrix | ~ ; . Describe the decomposition of R? into orbits geometrically, and make a 
clear drawing depicting it. 
Hint: If you don’t get a beautiful result, you haven’t understood the configuration. 

M.2. Describe the symmetry of the matrices AB + BA and AB — BA in the following cases. 
(a) A, Bsymmetric, (b) A, B Hermitian, (c) A, B skew-symmetric, 
(d) A symmetric, B skew-symmetric. 


M.L3. 


M.4. 


M.S. 


M.6. 


*M.7. 


M.8. 


*M_L9. 
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With each of the following types of matrices, describe the possible determinants and 
eigenvalues. 

(a) real orthogonal, (b) unitary, (c) Hermitian, (d) real symmetric, negative 
definite, (e) real skew-symmetric. 


oh 
Let E be an m Xn complex matrix. Prove that the matrix E ; is invertible. 
The vector cross product is xX y = (%2.3-X3,y2, X3¥1-X1Y3, X1.Y2-X21)’. Let v bea fixed 


vector in R°, and let T be the linear operator T(x) = (x Xv) X v. 


(a) Show that this operator is symmetric. You may use general properties of the scalar 
triple product det [x|y|z] = (x X y) - z, but not the matrix of the operator. 


(b) Compute the matrix. 


(a) What is wrong with the following argument? Let P be a real orthogonal matrix. 
Let X be a (possibly complex) eigenvector of P, with eigenvalue A. Then X'PLXY = 
(PX)'*X = AX*X. On the other hand, X'PtX = X'(P-1X) = A7-1X*X. Therefore 
A=A7landsoaA = +1. 

(b) State and prove a correct theorem based on the error in this argument. 


Let A be a real m Xn matrix. Prove that there are orthogonal matrices P in O», and Q. 
in O, such that PAQ is diagonal, with non-negative diagonal entries. 


(a 


—_ 


Show that if A is a nonsingular complex matrix, there is a positive definite Hermitian 

matrix B such that B* = A*4A, and that B is uniquely determined by A. 

(b) Let A bea nonsingular matrix, and let B be a positive definite Hermitian matrix such 
that B? = A*A. Show that AB"! is unitary. 

(c) Prove the Polar decomposition: Every nonsingular matrix A is a product A = UP, 
where P is positive definite Hermitian and U is unitary. 

(d) Prove that the Polar decomposition is unique. 

(e) What does this say about the operation of left multiplication by the unitary group Up, 

on the group GL,,? 


Let V be a Euclidean space of dimension n, and let S = (vj,..., ve) be a set of vectors 
in V. A positive combination of S is a linear combination p,v} + --- + Pug in which all 
coefficients p; are positive. The subspace U = {v|(uv, w) = 0} of V of vectors orthogonal 
to a vector w is called a hyperplane. A hyperplane divides the space V into two half 
spaces {u|(v, w) > 0} and {v](v, w) < 0}. 


(a) Prove that the following are equivalent: 
e Sis not contained in any half space. 


¢ For every nonzero vector w in V, (v;, w) <0 for somei=1,...,k. 


(b) Let S’ be the set obtained by deleting vz from S. Prove that if S is not contained in a 
half space, then S’ spans V. 


(c) Prove that the following conditions are equivalent: 


(i) S is not contained in a half space. 
(ii) Every vector in V is a positive combination of S. 
(iii) S spans V and 0 is a positive combination of S. 
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M.10. 


M.12. 


M.13. 


M.14. 
M.15. 


Hint: To show that (i) implies (ii) or (iii), I recommend projecting to the space U 
orthogonal to vz. That will allow you to use induction. 

The row and column indices in the n Xn Fourier matrix A run from 0 ton — 1, and thei, j 
entry is ¢1/, with ¢ = e?”!/", This matrix solves the following interpolation problem: Given 
complex numbers bp, ..., b,—1, finda complex polynomial f(t) = co-+eyt++-:+¢n_yt?7} 
such that f(¢”) = bp. 


(a) Explain how the matrix solves the problem. 
(b) Prove that A is symmetric and normal, and compute A?. 
«(c) Determine the eigenvalues of A. 


» Let A be areal n Xn matrix. Prove that A defines an orthogonal projection to its image 


W if and only if A? = A = A‘A. 
Let A be a real n Xn orthogonal matrix. 


(a) Let X be acomplex eigenvector of A with complex eigenvalue A. Prove that X'X = 0. 
Write the eigenvector as X = R-+ Si where R and S are real vectors. Show that 
the space W spanned by R and S is A-invariant, and describe the restriction of the 
operator A to W. 

(b) Prove that there is a real orthogonal matrix P such that P‘AP is a block diagonal 
matrix made up of 1 x 1 and 2 X2 blocks, and describe those blocks. 


Let V = R”, and let (X, Y) = X'AY, where A is a symmetric matrix. Let W be the 
subspace of V spanned by the columns of ann Xr matrix M ofrankr, andletw: V > W 
denote the orthogonal projection of V to W with respect to the form ( , ). One can 
compute zr in the form 7(X) = MY by setting up and solving a suitable system of linear 
equations for Y. Determine the matrix of z explicitly in terms of A and M. Check your 
result in the case that r = 1 and ( , ) is dot product. What hypotheses on A and M are 
necessary? 


What is the maximal number of vectors v; in R” such that (vu; -v;) < 0 for alli 7? 


'This problem is about the space V of real polynomials in the variables x and y. If f is 
a polynomial, 0 ¢ will denote the operator SZ 2) and d f(g) will denote the result of 


> ay 
applying this operator to a polynomial g. 


(a) The rule (f, g) = 0¢(g)o defines a bilinear form on V, the subscript 0 denoting 
evaluation of a polynomial at the origin. Prove that this form is symmetric and 
positive definite, and that the monomials x! y/ form an orthogonal basis of V (not an 
orthonormal basis). 

(b) We also have the operator of multiplication by f, which we write as m y. So 
m ¢(g) = fg. Prove that 0 ¢ and m ¢ are adjoint operators. 


(c) When f = x? + y’, the operator a f is the Laplacian, which is often written as 
A. A polynomial A is harmonic if Ah = 0. Let H denote the space of harmonic 
polynomials. Identify the space H+ orthogonal to H with respect to the given form. 


lSuggested by Serge Lang 


CHAPTER 9Q 


Linear Groups 


In these days the angel of topology and the devil of abstract algebra 
fight for the soul of every individual discipline of mathematics. 


—Hermann Weyl! 


9.1 THE CLASSICAL GROUPS 


Subgroups of the general linear group GL,» are called linear groups, or matrix groups. The 
most important ones are the special linear, orthogonal, unitary, and symplectic groups — the 
classical groups. Some of them will be familiar, but let’s review the definitions. 


The real special linear group SLy is the group of real matrices with determinant 1: 
(9.1.1) SLyn = {P € GL» (R) | det P = 1}. 

The orthogonal group Oy, is the group of real matrices P such that Pt = P™!: 
(9.1.2) On = {P € GLy(R)| P'P = 1}. 


A change of basis by an orthogonal matrix preserves the dot product X'Y on R". 
The unitary group U,, is the group of complex matrices P such that P* = P7!: 


(9.1.3) Un={P€GL,(C)| P*P= I}. 


A change of basis by a unitary matrix preserves the standard Hermitian product X*Y 
onc”, 
The symplectic group is the group of real matrices that preserve the skew-symmetric 
form X'SY on R2”, where 
0 TL 
=| 9) 


(9.1.4) SP2n = {P € GL2(R)|P'SP = S}. 


'This quote is taken from Morris Kline’s book Mathematical Thought from Ancient to Modern Times. 
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There are analogues of the orthogonal group for indefinite forms. The Lorentz group 
is the group of real matrices that preserve the Lorentz form (8.2.2) 


(9.1.5) O31 = {P EGLy | P'3iP = 13,1}. 


The linear operators represented by these matrices are called Lorentz transformations. An 
analogous group Op,m can be defined for any signature p, m. 


The word special is added to indicate the subgroup of matrices with determinant 1: 


Special orthogonal group SOny: real orthogonal matrices with determinant 1, 
Special unitary group SU,: unitary matrices with determinant 1. 


Though this is not obvious from the definition, symplectic matrices have determinant 1, so 
the two uses of the letter S do not conflict. 


Many of these groups have complex analogues, defined by the same relations. But 
except in Section 9.8, GLn, SLn, On, and SP, stand for the real groups in this chapter. 
Note that the complex orthogonal group is not the same as the unitary group. The defining 
properties of these two groups are P’P = J and P*P = I, respectively. 


We plan to describe geometric properties of the classical groups, viewing them as 
subsets of the spaces of matrices. The word “homeomorphism” from topology will come 
up. A homeomorphism yg: X — Y is a continuous bijective map whose inverse function 
is also continuous [Munkres, p. 105]. Homeomorphic sets are topologically equivalent. It 
is important not to confuse the words “homomorphism” and “homeomorphism,” though, 
unfortunately, their only difference is that ““chomeomorphism” has one more letter. 

The geometry of a few linear groups will be familiar. The unit circle, 


2 27 
Xo + X{ =; 


for instance, has several incarnations as a group, all isomorphic. Writing (xo,*1) = 
(cos@, sin@) identifies the circle as the additive group of angles. Or, thinking of it as 
the unit circle in the complex plane by e!® it becomes a multiplicative group, the group of 
unitary 1 X1 matrices: 


(9.1.6) U, ={peC™| pp=. 
The unit circle can also be embedded into R”? by the map 


cos@ -sin ‘| 


(9.1.7) (cos 6, sin) ~» Bp. Aad 


It is isomorphic to the special orthogonal group SOz2, the group of rotations of the plane. 
These are three descriptions of what is essentially the same group, the circle group. 

The dimension of a linear group G is, roughly speaking, the number of degrees of 
freedom of a matrix in G. The circle group has dimension 1. The group SZ2 has dimension 
3, because the equation det P = 1 eliminates one degree of freedom from the four matrix 
entries. We discuss dimension more carefully in Section 9.7, but we want to describe some 
of the low-dimensional groups first. The smallest dimension in which really interesting 
nonabelian groups appear is 3, and the most important ones are SU2, SO3, and SL2. We 
examine the special unitary group SU2 and the rotation group SO3 in Sections 9.3 and 9.4. 
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9.2 INTERLUDE: SPHERES 
By analogy with the unit sphere in R?, the locus 
(xp+xet--- +22 =1} 


in R"+! is called the n-dimensional unit sphere, or the n-sphere, for short. We’ll denote it by 
S", Thus the unit sphere in R? is the 2-sphere S?, and the unit circle in R? is the 1-sphere S!. 
A space that is homeomorphic to a sphere may sometimes be called a sphere too. 

We review stereographic projection from the 2-sphere to the plane, because it can be 
used to give topological descriptions of the sphere that have analogues in other dimensions. 
We think of the xo-axis as the vertical axis in (xo, x1, X2)-space R°>. The north pole on the 
sphere is the point p = (1, 0, 0). We also identify the locus {xo = 0} with a plane that we 
call V, and we label the coordinates in V as vj, v2. The point (v1, v2) of V corresponds to 
(0, v1, v2) in R?. 

Stereographic projection 2:S* — V is defined as follows: To obtain the image 7r(x) of 
a point x on the sphere, one constructs the line @ that passes through p and x. The projection 
7(x) is the intersection of £ with V. The projection is bijective at all points of S* except the 
north pole, which is “‘sent to infinity.” 


(9.2.1) Stereographic Projection. 


One way toconstruct the sphere topologically is as the union of the plane V and a single 
point, the north pole. The inverse function to 2 does this. It shrinks the plane a lot near 
infinity, because a small circle about p on the sphere corresponds to a large circle in the plane. 

Stereographic projection is the identity map on the equator. It maps the southern 
hemisphere bijectively to the unit disk {vt + v5 < 1} in V, and the northern hemisphere to 
the exterior {vt + v5 > 1} of the disk, except that the north pole is missing from the exterior. 
On the other hand, stereographic projection from the south pole would map the northern 
hemisphere to the disk. Both hemispheres correspond bijectively to disks. This provides a 
second way to build the sphere topologically, as the union of two unit disks glued together 
along their boundaries. The disks need to be stretched, like blowing up a balloon, to make 
the actual sphere. 

To determine the formula for stereographic projection, we write the line through p 
and x in the parametric form g(t) = p+ t(x — p) = 1+ t(xo—-1), tx1, tx2). The point gi) 
is in the plane V when t = ore So 


(9.2.2) n(x) = (v1, 29) = ( PO ee 2 


1—xp' 1-—x9 
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Stereographic projection 2 from the n-sphere to n-space is defined in exactly the 
same way. The north pole on the n-sphere is the point p = (1, 0,..., 0), and we identify 
the locus {xp = 0} in R”+! with an n-space V. A point (v1, ..., Un) of V corresponds to 
(0, v1,..., Un) in R’*!. The image 2r(x) of a point x on the sphere is the intersection of the 
line @ through the north pole p and x with V. As before, the north pole p is sent to infinity, 
and 7¢ is bijective at all points of S” except p. The formula for zr is 


(9.2.3) mx) = (72 i). 


totp Lx 


This projection maps the lower hemisphere {xo < 0} bijectively to the n-dimensional 
unit ball in V, the locus {v? +---+4 v2 < 1}, while projection from the south pole maps the 
upper hemisphere {xo > 0} to the unit ball. So, as is true for the 2-sphere, the m-sphere can 
be constructed topologically in two ways: as the union of an n-space V and a single point 
Pp, or as the union of two copies of the n-dimensional unit ball, glued together along their 
boundaries, which are (n — 1)-spheres, and stretched appropriately. 


We are particularly interested in the three-dimensional sphere S°, and it is worth making 
some effort to become acquainted with this locus. Topologically, S* can be constructed either 
as the union of 3-space V and a single point p, or as the union of two copies of the unit 
ball {vt + v3 + v; < 1} in R°, glued together along their boundaries (which are ordinary 
2-spheres) and stretched. Neither construction can be made in three-dimensional space. 

We can think of V as the space in which we live. Then via stereographic projection, 
the lower hemisphere of the 3-sphere S? corresponds to the unit ball in space. Traditionally, 
it is depicted as the terrestrial sphere, the Earth. The upper hemisphere corresponds to the 
exterior of the Earth, the sky. 

On the other hand, the upper hemisphere can be made to correspond to the unit ball 
via projection from the south pole. When thinking of it this way, it is depicted traditionally as 
the celestial s phere. (The phrases “‘terrestial ball” and ‘“‘celestial ball” would fit mathematical 
terminology better, but they wouldn’t be traditional.) 


(9.2.4) A Model of the Celestial Sphere. 
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To understand this requires some thought. When the upper hemisphere is represented 
as the celestial sphere, the center of the ball corresponds to the north pole of S3, and to 
infinity in our space V. While looking at a celestial globe from its exterior, you must imagine 
that you are standing on the Earth, looking out at the sky. It is a common mistake to think 
of the Earth as the center of the celestial sphere. 


Latitudes and Longitudes on the 3-Sphere 


The curves of constant latitude on the globe, the 2-sphere {xn + ae + ae = 1}, are the 
horizontal circles x9 = c, with -1 < c < 1, and the curves of constant longitude are the 
vertical great circles through the poles. The longitude curves can be described as intersections 
of the 2-sphere with the two-dimensional subspaces of R? that contain the pole (1, 0, 0). 

When we go to the 3-sphere os + or + i + x5 = 1}, the dimension increases, and one 
has to make some decisions about what the analogues should be. We use analogues that will 
have algebraic significance for the group SU? that we study in the next section. 

As analogues of latitude curves on the 3-sphere, we take the ‘“‘horizontal’’ surfaces, 
the surfaces on which the xo-coordinate is constant. We call these loci latitudes. They are 
two-dimensional spheres, embedded into R* by 


(9.2.5) xo=e, xe+x3t+x3=(1-c*), with -l<c<1. 


The particular latitude defined by xo = 0 is the intersection of the 3-sphere with the 
horizontal space V. It is the unit 2-sphere {vt + vs + vs = 1} in V. We call this latitude the 
equator, and we denote it by E. 

Next, as analogues of the longitude curves, we take the great circles through the north 
pole (1, 0, 0, 0). They are the intersections of the 3-sphere with two-dimensional subspaces 
W of R* that contain the pole. The intersection L = WN S? will be the unit circle in W, and 
we call L a longitude. If we choose an orthonormal basis (p, v) for the space W, the first 
vector being the north pole, the longitude will have the parametrization 


(9.2.6) L: £(@) = cos@p + sinOv. 


This is elementary, but we verify it below. 
Thus, while the latitudes on S? are 2-spheres, the longitudes are 1-spheres. 


Lemma 9.2.7 Let (p, v) be an orthonormal basis for a subspace W of R’, the first vector 
being the north pole p, and let L be the longitude of unit vectors in W. 


(a) L meets the equator E in two points. If v is one of those points, the other one is -v. 


(b) L has the parametrization (9.2.6). If g is a point of L, then replacing v by -vif necessary, 
one can express q in the form £(9) with 0 in the interval 0 < 9 < z, and then this 
representation of a point of L is unique for all 040, z. 


(c) Except for the two poles, every point of the sphere S? lies on a unique longitude. 


Proof. We omit the proof of (a). 


(b) This is seen by computing the Jength of a vector ap + bu of W: 
lap + bu? = a’(p -p)+2ab(p-v) + b*(v vy =at+bh’. 
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So ap + bvis a unit vector if and only if the point (a, b) lies on the unit circle, in which case 
a= cos@ and b = sin@ for some 0. 


(c) Let x be a unit vector in R‘, not on the vertical axis. Then the set (p, x) is independent, 
and therefore spans a two-dimensional subspace W containing p. So x lies in just one such 
subspace, and in just one longitude. O 


9.3 THE SPECIAL UNITARY GROUP SU2 


The elements of SU2 are complex 2 X 2 matrices of the form 


(9.3.1) Pel ¢ ap with Ga +bb =1. 


Let’s verify this. Let P = a al be an element of SU), with a, b, u, vin C. The equations 
that define SU2 are P* = P"! and det P = 1. When det P = 1, the equation P* = P™! 


becomes 
B a]-r-rt=[2 4), 
b DB -u a 


Therefore v = @, u =-b, and then det P =@a+ bb=1. O 


Writing a = xp + xyi and b = x2 + x3i defines a bijective correspondence of SU2 with 
the unit 3-sphere ee + a + x + x = l}inR*. 


SU prey s 
(9.3.2) Xo+xyl x2+Xx31 
oe eee eagle MeO) 


This gives us two notations for an element of SU2. We use the matrix notation as much as 
possible, because it is best for computation in the group, but length and orthogonality refer 
to dot product in R4. 


Note: The fact that the 3-sphere has a group structure is remarkable. There is no way to 
make the 2-sphereinto a group. A famous theorem of topology asserts that the only spheres 
on which one can define continuous group laws are the 1-sphere and the 3-sphere. Oo 


In matrix notation, the north pole eg = (1, 0, 0, 0) on the sphere is the identity matrix J. 
The other standard basis vectors are the matrices that define the quaternion group (2.4.5). 
We list them again for reference: 


(9.3.3) i=|t gels ie | <— €1, €2, 63. 


These matrices satisfy relations such as ij = k that were displayed in (2.4.6). The real vector 
space with basis (/, i, j,k) is called the quaternion algebra. So SU2 can be thought of as the 
set of unit vectors in the quaternion algebra. 
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Lemma 9.3.4 Except for the two special matrices +/, the eigenvalues of P (9.3.2) are 
complex conjugate numbers of absolute value 1. 


Proof. The characteristic polynomial of Pis t* — 2xof + 1, and its discriminant D is 4x3 — 4. 
When (Xo, X1, %2, ¥3) ison the unit sphere, x9 is in the interval -1 < x9 < 1, and D <0. (In 
fact, the eigenvalues of any unitary matrix have absolute value 1.) O 


We now describe the algebraic structures on SU> that correspond to the latitudes and 
longitudes on S? that were defined in the previous section. 


Proposition 9.3.5 The latitudes in SU are conjugacy classes. For a given c in the interval 
-1 <c <1, the latitude {xp = c} consists of the matrices P in SU2 such that trace P = 2c. 
The remaining conjugacy classes are {I} and {-J}. They make up the center of SU). 


The proposition follows from the next lemma. 


Lemma 9.3.6 Let P be an element of SU2 with eigenvalues A and A. There is an element Q 
in SU2 such that Q* PQ is the diagonal matrix A with diagonal entries A and A. Therefore all 
elements of SU2 with the same eigenvalues, or with the same trace, are conjugate. 


Proof. One can base a proof of the lemma on the Spectral Theorem for unitary operators, 
or verify it directly as follows: Let X = (u, v)' be an eigenvector of P of length 1, with 
eigenvalue A, and let Y = (-v, #)'. You will be able to check that Y is an eigenvector of P 


with eigenvalue A, that the matrix Q = EF | isin SU, and that PQ = QA. O 


The equator E of SU2 is the latitude defined by the equation trace P = 0 (or xp = 0). 
A point on the equator has the form 


(9.3.7) A= Pa | = xyi+ xj + x3k. 
Notice that the matrix A is skew-Hermitian: A* = ~A, and that its trace is zero. We haven’t 
run across skew-Hermitian matrices before, but they are closely related to Hermitian 
matrices: a matrix A is skew-Hermitian if and only if 7A is Hermitian. 

The 2X2 skew-Hermitian matrices with trace zero form a real vector space of 
dimension 3 that we denote by V, in agreement with the notation used in the previous 
section. The space V is the orthogonal space to J. It has the basis (i, j, k), and E is the unit 
2-sphere in V. 


Proposition 9.3.8 The following conditions on an element A of SU2 are equivalent: 


e Ais on the equator, ie., trace A = 0, 
e the eigenvalues of A are i and -i, 
« A? =-I. 


Proof. The equivalence of the first two statements follows by inspection of the characteristic 
polynomial 1? — (trace A)t + 1. For the third statement, we note that -/ is the only matrix 
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in SU with an eigenvalue -1. If A is an eigenvalue of A, then A? is an eigenvalue of A”. So 
A = ti if and only if A” has eigenvalues -1, in which case A? = -/. O 


Next, we consider the longitudes of SU2, the intersections of SU2 with two-dimensional 
subspaces of R‘ that contain the pole /. We use matrix notation. 


Proposition 9.3.9 Let W be a two-dimensional subspace of R‘ that contains /, and let L be 
the longitude of unit vectors in W. 


(a) L meets the equator E in two points. If A is one of them, the other one is -A. Moreover, 
(7, A) is an orthonormal basis of W. 

(b) The elements of L can be written in the form Pg = (cos @)/ + (sin OA, with A on E 
and 0 < @ < 27. When P¥ +1, A and @ can be chosen with 0 < 6 < x, and then the 
expression for P is unique. 

(c) Every element of SU2 except +/ lies on a unique longitude. The elements +/ lie on 
every longitude. 

(d) The longitudes are conjugate subgroups of SU. 


Proof. When one translates to matrix notation, the first three assertions become Lemma 
9.2.7. To prove (d), we first verify that a longitude L is a subgroup. Let c, s and’, s’ denote 
the cosine and sine of the angles @ and a’, respectively, and let 8 = a + a’. Then because 
A? =-I, the addition formulas for cosine and sine show that 


(cI + sA)(c'l + s'A) = (cc! — ss’)I + (cs’ + sc’)A = (cos B)I + (sin BJA. 


So L is closed under multiplication. It is also closed under inversion. 

Finally, we verify that the longitudes are conjugate. Say that L is the longitude 
Pg = cl + SA, as above. Proposition 9.3.5 tells us that A is conjugate to i, say i = QAQ*. 
Then QP9Q* = cQIQ* + sQAQ* = cI +Si.So L is conjugate to the longitude cJ+ si. O 


Examples 9.3.10 


e The longitude c/ + si, with c = cos@ and s = sin 98, is the group of diagonal matrices 
in SU2. We denote this longitude by T. Its elements have the form 


filet el eo] 


* The longitude c/ + sjis the group of real matrices in SU2, the rotation group SO. 
The matrix c/ + si represents rotation of the plane through the angle -0. 


1 1 c Ss 
[h afeela “J=[s @ 
We haven’t run across the the longitude c/ + sk before. O 


The figure below was made by Bill Schelter. It shows a projection of the 3-sphere SU 
onto the unit disc in the plane. The elliptical disc shown is the image of the equator. Just 
as the orthogonal projection of a circle from R? to R? is an ellipse, the projection of the 
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2-sphere E from R‘ to R? is an ellipsoid, and the further projection of this ellipsoid to the 
plane maps it onto an elliptical disc. Every point in the interior of the disc is the image of 
two points of E. 


use SOS 


Diagonal 
matrices 


Trace-zero 
matrices 


SU, 


(9.3.11) Some Latitudes and Longitudes in SU}. 


9.4 THE ROTATION GROUP SO; 


Since the equator E of SU, is a conjugacy class, the group operates on it by conjugation. 
We will show that conjugation by an element P of SU2, an operation that we denote by yp, 
rotates this sphere. This will allow us to describe the three-dimensional rotation group SO3 
in terms of the special unitary group SU2. 

The poles of a nontrivial rotation of E are its fixed points, the intersections of E with 
the axis of rotation (5.1.22). If A is on E, (A, @) will denote the spin that rotates E with angle 
a about the pole A. The two spins (A, a) and (-A, -@) represent the same rotation. 


Theorem 9.4.1 


(a) The rule P~» yp defines a surjective homomorphism y: SU2 > SO3, the spin homo- 
mor phism. Its kernel is the center {+7} of SU. 

(b) Suppose that P = cos 67 + sin@A, with 0 < @ < 7 and with A on E. Then yp rotates E 
about the pole A, through the angle 20. So yp is represented by the spin (A, 20). 


The homomorphism y described by this theorem is called the orthogonal representation of 
SU. It sends a matrix Pin SU2, a complex 2 X 2 matrix, to a mysterious real 3 X 3 rotation 
matrix, the matrix of yp. The theorem tells us that every element of SU2 except +J can 
be described as a nontrivial rotation together with a choice of spin. Because of this, SU2 is 
often called the spin group. 


We discuss the geometry of the map y before proving the theorem. If P is a point of 
SU}, the point -P is its antipodal point. Since y is surjective and since its kernel is the center 
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Z = {+1}, SO3 is isomorphic to the quotient group SU2/Z, whose elements are pairs of 
antipodal points, the cosets {+ P} of Z. Because y is two-to-one, SU, is called a double 
covering of SO3. 

The homomorphism 2: SO2 > SO? of the 1-sphere to itself defined by pg ~~ Pr2g 
is another, closely related, example of a double covering. Every fibre of jz consists of two 
rotations, Og and (9,7. 

The orthogonal representation helps to describe the topological structure of the 
rotation group. Since elements of SO3 correspond to pairs of antipodal points of SU2, we 
can obtain SO3 topologically by identifying antipodal points on the 3-sphere. The space 
obtained in this way is called (real) projective 3-space, and is denoted by P>. 


(9.4.2) SO} is homeomorphic to projective 3-space P’. 


Points of P? are in bijective correspondence with one-dimensional subspaces of R*, Every 
one-dimensional subspace meets the unit 3-sphere in a pair of antipodal points. 

The projective space P? is much harder to visualize than the sphere S°. However, it is 
easy to describe projective 1-space P!, the set obtained by identifying antipodal points of 
the unit circle S'. If we wrap S! around so that it becomes the lefthand figure of (9.4.3), the 
figure on the right will be P!. Topologically, P? is a circle too. 


(9.4.3) A Double Covering of the 1-Sphere. 


We’ll describe P! again, in a way that one can attempt to extend to higher dimensional 
projective spaces. Except for the two points on the horizontal axis, every pair of antipodal 
points of the unit circle contains just one point in the lower semicircle. So to obtain P!, we 
simply identify a point pair with a single point in the lower semicircle. But the endpoints of 
the semicircle, the two points on the horizontal axis, must still be identified. So we glue the 
endpoints together, obtaining a circle as before. 

In principle, the same method can be used to describe P?. Except for points on the 
equator of the 2-sphere, a pair of antipodal points contains just one point in the lower 
hemisphere. So we can form P* from the lower hemisphere by identifying opposite points of 
the equator. Let’s imagine that we start making this identification by gluing a short segment 
of the equator to the opposite segment. Unfortunately, when we orient the equator to keep 

. track, we see that the opposite segment gets the opposite orientation. So when we glue the 
two segments together, we have to insert a twist. This gives us, topologically, a Mobius band, 
and P* contains this Mébius band. It is not an orientable surface. 

Then to visualize P?, we would take the lower hemisphere in S? and identify antipodal 
points of its equator E. Or, we could take the terrestial ball and identify antipodal points of 
its boundary, the surface of the Earth. This is quite confusing. O 
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We begin the proof of Theorem 9.4.1 now. We recall that the equator E is the unit 
2-sphere in the three-dimensional space V of trace zero, skew-Hermitian matrices (9.3.7). 
Conjugation by an element P of SU> preserves both the trace and the skew-Hermitian 
property, so this conjugation, which we are denoting by yp, operates on the whole space V. 
The main point is to show that yp is a rotation. This is done in Lemma 9.4.5 below. 

Let (U,V) denote the form on V that is carried over from dot product on R3. 
The basis of V that corresponds to the standard basis of R? is (i,j, k) (9.3.3). We write 
U = ujit uj + 43k and use analogous notation for V. Then 


(U,V) = uyvy + u2v2 + 4303. 
Lemma 9.4.4 With notation as above, (U, V) = -5trace(U V). 


Proof. We compute the product UV using the quaternion relations (2.4.6): 
UV = (uyi+ udj + u3k) (vii + v2j + v3k) 
= -(u,v; + Uv. +: u303)1 + UXV, 
where U X V is the vector cross product 
UXV = (u2v3 — u302)i + (4304 — U4 U3)j + (U1 02 — 421) K. 
Then because trace J = 2, and because i, j, k have trace zero, 
trace(UV) = -2(u, v1 + u2Vv2 + u3V3) =-2(U, V). O 
Lemma 9.4.5 The operator yp is a rotation of E and of V. 
Proof. For review, yp is the operator defined by ypU = PUP*. The safest way to prove that 
this operator is a rotation may be to compute its matrix. But the matrix is too complicated to 
give much insight. It is nicer to describe y indirectly. We will show that yp is an orthogonal 
linear operator with determinant 1. Euler’s Theorem 5.1.25 will tell us that it is a rotation. 
To show that yp is a linear operator, we must show that for all U and V in V and 
all real numbers r, yp(U + V) = ypU + ypV and yp(rU) = r(ypU). We omit this routine 
verification. To prove that yp is orthogonal, we verify the criterion (8.6.9) for orthogonality, 
which is 
(9.4.6) (ypU, ypV) = (U, V). 


This follows from the previous lemma, because trace is preserved by conjugation. 


(yeU, ypV) = -5 trace((ypU)(ypV)) = -4 trace(PUP* PVP*) 
= -} trace(PUVP*) = -} trace(UV) = (U, V). 


Finally, to show that the determinant of yp is 1, we recall that the determinant of any 
orthogonal matrix is +1. Since SU2 is a sphere, it is path connected, and since the 
determinant is a continuous function, only one of the two values +1 can be taken on by 
det yp. When P = I, yp is the identity operator, which has determinant 1. So det yp = 1 for 
every P. oO 
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We now prove part (a) of the theorem. Because yp is a rotation, y maps SU to SO3. 
The verification that y is a homomorphism is simple: ypyg = ypg because 


yp(yqU) = P(QUQ")P* = (PQ)U(PQ)* = ypgwU. 


We show next that the kernel of y is +/. If P is in the kernel, conjugation by P fixes 
every element of E, which means that P commutes with every such element. Any element of 
SU> can be written in the form Q = cI + sB with B in E. Then P commutes with Q too. So P 
is in the center {+} of SU. The fact that y is surjective will follow, once we identify 20 as 
the angle of rotation, because every angle a has the form 20, with O < 6 < z. 


Let P be an element of SU>, written in the form P = cos@/] + sin@A with A in E. It is 
true that ypA = A, so A isa pole of yp. Let a denote the angle of rotation of yp about the 
pole A. To identify this angle, we show first that it is enough to identify the angle for a single 
matrix P in a conjugacy class. 

Say that P’ = QPQ*(= yoP) is a conjugate, where Q is another element of SU. Then 
P’ = cos 6] + sin@A’, where A’ = ygA = QAQ*. The angle @ has not changed. 

Next, we apply Corollary 5.1.28, which asserts that if M and N are elements of SO3, and 
if M isa rotation with angle a about the pole X, then the conjugate M’ = NMN | is arotation 
with the same angle @ about the pole NX. Since y is a homomorphism, yp = yo vPVo! 
Since yp is a rotation with angle a about A, yp: is a rotation with angle aw about A’ = yo A. 
The angle @ hasn’t changed either. 

This being so, we make the computation for the matrix P = cos @/ + sin Gi, which is the 
diagonal matrix with diagonal entries e!? and e*?. We apply yp toj: 


; ; ei? 11 fe% e2id 
(9.4.7) ypj = PjP* = -| E | = | --2 


= cos 20j + sin 20k. 


The set (j, k) is an orthonormal basis of the orthogonal space W toi, and the equation above 
shows that yp rotates the vector j through the angle 26 in W. The angle of rotation is 20, as 
predicted. This completes the proof of Theorem (9.4.1). O 


9.5 ONE-PARAMETER GROUPS 

In Chapter 5, we used the matrix-valued function 
te hag tth pPAD | tr AP 

(9.5.1) e Say Pay PF Sar 


to describe solutions of the differential equation ax = AX. The same function describes the 
one-parameter groups in the general linear group — the differentiable homomorphisms from 
the additive group R* of real numbers to GL. 


Theorem 9.5.2 


(a) Let A be an arbitrary real or complex matrix, and let GL, denote GL,,(R) or GLy,(C). 
The map g:R*+ > GL, defined by y(t) = e’ is a group homomorphism. 
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(b) Conversely, let g:R*t > GL, be a differentiable map that is a homomorphism, and let 
A denote its derivative y’(0) at the origin. Then g(t) = e’ for all t. 


Proof. For any real numbers r and s, the matrices 7A and sA commute. So (see (5.4.4)) 
(9.5.3) eta @ fA osA 


This shows that e'4 is a homomorphism. Conversely, let g:R+ > GL,» be a differentiable 
homomorphism. Then g(At + tf) = g(Atp(t) and y(t) = p(0) p(t), so we can factor p(t) 
out of the difference quotient: 


pAt+t)- 9 _ p(AD — 9) 


(9.5.4) MG 7G 


g(t). 


Taking the limit as At — 0, we see that g/(t) = g/(0)g(t) = Ag(t). Therefore y(t) is a 
matrix-valued function that solves the differential equation 


dy 
(9.5.5) apt AQ. 
The function e’4 is another solution, and when t¢ = 0, both solutions take the value J. 
Therefore v(t) = e'4 (see (5.4.9)). oO 
Examples 9.5.6 


(a) Let A be the 2X2 matrix unit e;2. Then A? = 0. All but two terms of the series expansion 
for the exponential are zero, and e’4 = I + ej2t. 


_{0 1 ta _|1 ¢t 
ta=[) 0 then e -| ae 


(b) The usual parametrization of SO>2 is a one-parameter group. 
IfA= k ak then e!4 = et ~sint I 
1 0 sin ft cost 
(c) The usual parametrization of the unit circle in the complex plane is a one-parameter 
group in Uj. 
If ais a nonzero real number and a@ = ai, then e' = [cos at +isinat]. O 


If a is a nonreal complex number of absolute value #1, the image of e’ in C% will be a 
logarithmic spiral. If a is a nonzero real number, the image of e” is the positive real axis, 
and if a = 0 the image consists of the point 1 alone. 

If we are given a subgroup H of GLy, we may also ask for one-parameter groups 
in H, meaning one-parameter groups whose images are in H, or differentiable homo- 
morphisms gy: R* — H. It turns out that linear groups of positive dimension always 
have one-parameter groups, and they are usually not hard to determine for a particular 
group. 

Since the one-parameter groups are in bijective correspondence with n Xn matrices, 
we are asking for the matrices A such that e’4 is in H for all t. We will determine the 
one-parameter groups in the orthogonal, unitary, and special linear groups. 
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(9.5.7) Images of Some One-Parameter Groups in C* = GL,(C). 


Proposition 9.5.8 


(a) If A is a real skew-symmetric matrix (A' = -A), then e4 is orthogonal. If A is a complex 
skew-Hermitian matrix (A* = -A), then e4 is unitary. 

(b) The one-parameter groups in the orthogonal group O, are the homomorphisms f ~~ e'4, 
where A is a real skew-symmetric matrix. 


(c) The one-parameter groups in the unitary group U, are the homomorphisms t~» e4, 
where A is a complex skew-Hermitian matrix. 


Proof. We discuss the complex case. 

The relation (e4)* = e4”) follows from the definition of the exponential, and we know 
that (e4)"! = e4 (5.4.5). So if A is skew-Hermitian, i.e., A* = —A, then (e4)* = (e4)7}, 
and e4 is unitary. This proves (a) for complex matrices. 

Next, if A is skew-Hermitian, so is tA, and by what was shown above, e is unitary 
for all f, so it is a one-parameter group in the unitary group. Conversely, suppose that e’4 is 
unitary for all tf. We write this as e4” = e~“4. Then the derivatives of the two sides of this 
equation, evaluated at t = 0, must be equal, so A* = -A, and A is skew-Hermitian. 

The proof for the orthogonal group is the same, when we interpret A* as A’. O 


tA 


We consider the special linear group SL,, next. 
Lemma 9.5.9 For any square matrix A, e@°°4 = det e4. 


Proof. Aneigenvector X of A with eigenvalue A is also an eigenvector of e4 with eigenvalue 
e*. So, if A1,..., An are the eigenvalues of A, then the eigenvalues of e4 are e*i. The trace 
of A is the sum A, + --- + An, and the determinant of e4 is the product e*! .-.e*" (4.5.15). 
Therefore et"4°°4 — eAitt4n — edt... edn = dete’. Oo 
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Proposition 9.5.10 The one-parameter groups in the special linear group SL»y are the 
homomorphisms t ~» e'4, where A is a real n X n matrix whose trace is zero. 


Proof, Lemma 9.5.9 shows that if trace A = 0, then det e'4 = ef traced — 2° — ] for all t, so 
e’4 is a one-parameter group in SL,. Conversely, if det e’4 = 1 for all t, the derivative of 
eftraceA evaluated at t = 0, is zero. The derivative is trace A. Oo 


The simplest one-parameter group in SL2 is the one in Example 9.5.6(a). The one- 
parameter groups in SU? are the longitudes described in (9.3.9). 


9.6 THE LIE ALGEBRA 


The space of tangent vectors to a matrix group G at the identity is called the Lie algebra of 
the group. We denote it by Lie(G). It is called an algebra because it has a law of composition, 
the bracket operation that is defined below. 

For instance, when we represent the circle group as the unit circle in the complex plane, 
the Lie algebra is the space of real multiples of i. 

The observation from which the definition of tangent vector is derived is something 
we learn in calculus: If g(t) = (g(t), ..., Mx (t)) is a differentiable path in R*, the velocity 
vector v = g’(0) is tangent to the path at the point x = g(0). A vector v is said to be tangent 
to a subset S of Ré at a point x if there is a differentiable path g(t), defined for sufficiently 
small ¢ and lying entirely in S, such that (0) = x and g’(0) = v. 

The elements of a linear group G are matrices, so a path g(t) in G will be a matrix- 
valued function. Its derivative g’(0) at t = 0 will be represented naturally as a matrix, 
and if g(0) = J, the matrix g’(0) will be an element of Lie(G). For example, the usual 
parametrization (9.5.6)(b) of the group SO2 shows that the matrix E a isin Lie(SO2). 

We already know a few paths in the orthogonal group O,: the one-parameter 
groups y(t) = e“', where A is a skew-symmetric matrix (9.5.8). Since (e4"),-9 = I and 
(Ger ),-9 = A, every skew-symmetric matrix A is a tangent vector to O, at the identity — an 
element of its Lie algebra. We show now that the Lie algebra consists precisely of those 
matrices. Since one-parameter groups are very special, this isn’t completely obvious. There 
are many other paths. 


Proposition 9.6.1 The Lie algebra of the orthogonal group O, consists of the skew- 
symmetric matrices. 


Proof. We denote transpose by x. If g is a path in O,, with g(0) = J and g’(0) = A, then 
g(t)* g(t) = I identically, and so 4 (p(t)*o(t)) = 0. Then 
* 


dg dp 


d 
— * — — —= = A* A = 0. 
gO mo= (Ge ote Ge) At A 


Next, we consider the special linear group SL,. The one-parameter groups in SL, 
have the form g(t) = e4', where A is a trace-zero matrix (9.5.10). Since (e4);-9 = J and 
(4e4*) 9 = 4, every trace-zero matrix A is a tangent vector to SLy at the identity - an 
element of its Lie algebra. 
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Lemma 9.6.2. Let g be a path in GL, with g(0) = J and g’(0) = A. Then (4 (det ~)) <0 = 
trace A. 


Proof. We write the matrix entries of g as gj;, and we compute g det g using the complete 
expansion (1.6.4) of the determinant: 


detp= )- (sign p) 91, p1°+* @n,pn- 
PESn 
By the product rule, 


d Z ; 
(9.6.3) Pipl oo Yn, pn) = ye #1, pl ams Pi, pi ***@n,pn- 
i=1 


We evaluate at t = 0. Since y(0) = J, g;;(0) = 0 if i# 7 and g;;(0) = 1. So in the sum 
(9.6.3), the term $1, p1--- Y;. pi ‘+-@n, pn evaluates to zero unless pj = j for all j#i, and 
if pj = j for all j7#i, then since p is a permutation, pi = i too, and therefore p is the 
identity. So (9.6.3) evaluates to zero except when p = 1, and when p = 1, it becomes 
>»; 9; (0) = trace A. This is the derivative of det ¢. 0 


Proposition 9.6.4 The Lie algebra of the special linear group SL, consists of the trace-zero 
matrices. Oo 


Proof, If gy is a path in the special linear group with g(0) = J and gy (0) = A, then 
det (g(1)) = 1 identically, and therefore 4 det (p(t)) = 0. Evaluating at t = 0, we obtain 
trace A = 0. O 


Similar methods are used to describe the Lie algebras of other classical groups. Note 
also that the Lie algebras of O,, and SL, are real vector spaces, subspaces of the space 
of matrices. It is usually easy to verify for other groups that Lie(G) is a real vector 
space. 


The Lie Bracket 


The Lie algebra has an additional structure, an operation called the bracket, the law of 
composition defined by the rule 


(9.6.5) [A, B] = AB— BA. 


The bracket is a version of the commutator: It is zero if and only if A and B commute. It isn’t 
an associative law, but it satisfies an identity called the Jacobi identity: 


(9.6.6) [A, [B, C]] + [B, [C, A]] + [C, [A, B]] =0. 


To show that the bracket is defined on the Lie algebra, we must check that if A and 
B are in Lie(G), then [A, B] is also in Lie(G). This can be done easily for any particular 
group. For the special linear group, the required verification is that if A and B have trace 
zero, then AB~BA also has trace zero, which is true because trace AB = trace BA. The Lie 
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algebra of the orthogonal group is the space of skew-symmetric matrices. For that group, we 
must verify that if A and B are skew-symmetric, then [A, B] is skew-symmetric: 


[A, B]' = (AB)! — (BA)! = B'At — A'B' = (-B)(-A) — (-A)(-B) = -[A, B]. 
The definition of an abstract Lie algebra includes a bracket operation. 


Definition 9.6.7 A Lie algebra V is a real vector space together with a law of composition 
VxV — V denoted by v, w ~+[v, w] and called the bracket, which satisfies these axioms 
for all u, v, win V and allcinR: 


bilinearity: [v, + v2, w] =[v1, w]+[v2,w] and [cv, w] =c[v, w], 
[v, wi + w2] =[v, wi) +[v, w2] and [v, cw] =c[v, w], 
skew symmetry: [v, w]=-[w,v], or [v, v}] =0, 
Jacobi identity: [u,[v, w]] + [v,[w,u]] + [w, [u, v}] =0. 


Lie algebras are useful because, being vector spaces, they are easier to work with 
than linear groups. And, though this is not easy to prove, many linear groups, including the 
classical groups, are nearly determined by their Lie algebras. 


9.7 TRANSLATION IN A GROUP 


Let P be an element of a matrix group G. Left multiplication by P is a bijective map from G 
to itself: sa 

P 
(9.7.1) —— 


X~» PX. 


Its inverse function is left multiplication by P-!. The maps mp and mp-: are continuous 
because matrix multiplication is continuous. Thus mp is a homeomorphism from G to G 
(not a homomorphism). It is also called left translation by P, in analogy with translation in 
the plane, which is left translation in the additive group R?+. 

The important property of a group that is implied by the existence of these maps is 
homogeneity. Multiplication by P is a homeomorphism that carries the identity element / 
to P. Intuitively, the group looks the same at P as it does at J, and since P is arbitrary, it 
looks the same at any two points. This is analogous to the fact that the plane looks the same 
everywhere. 

Left multiplication in the circle group SO rotates the circle, and left multiplication 
in SU; is also a rigid motion of the 3-sphere. But homogeneity is weaker in other matrix 
groups. For example, let G be the group of real invertible diagonal 2 x2 matrices. If we 
identify the elements of G with the points (a, d) in the plane and not on the coordinate axes, 
multiplication by the matrix 


(9.7.2) Pe E 4 


distorts the group G, but it does this continuously. 


278 Chapter 9 Linear Groups 


SS & 


(9.7.3) Left Multiplication in a Group. 


Now the only geometrically reasonable subsets of R* that have such a homogeneity 
property are manifolds. A manifold M of dimension d is a set in which every point has a 
neighborhood that is homeomorphic to an open set in R@ (see [Munkres], p. 155). It isn’t 
surprising that the classical groups are manifolds, though there are subgroups of G Ly, that 
aren’t. The group GL,,(Q) of invertible matrices with rational coefficients is an interesting 
group, but it is a countable dense subset of the space of matrices. 

The following theorem gives a satisfactory answer to the question of which linear 
groups are manifolds: 


Theorem 9.7.4 A subgroup of GL, that is a closed subset of GL, is a manifold. 


Proving this theorem here would take us too far afield, but we illustrate it by showing 
that the orthogonal groups are manifolds. Proofs for the other classical groups are similar. 


Lemma 9.7.5 The matrix exponential A ~» e4 maps a small neighborhood U of 0 in R™” 
homeomorphically to a neighborhood V of J in GL, (R). 


The fact that the exponential series converges uniformly on bounded sets of matrices implies 
that it is a continuous function ([Rudin] Thm 7.12). To prove the lemma, one needs to show 
that it has a continuous inverse function for matrices sufficiently near to J. This can be proved 
using the inverse function theorem, or the series for log(1 + x): 


(9.7.6) log(1 +x) =x- 5x? 4 4x7-- 


The series log(/ + B) converges for small matrices B, and it inverts the exponential. O 


exponential \\ 
—_—_—_—_———_—S 


" @ 


(9.7.7) The Matrix Exponential. 
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Proposition 9.7.8 The orthogonal group O, is a manifold of dimension n(n —1). 


Proof. We denote the group O, by G, and its Lie algebra, the space of skew-symmetric 
matrices, by L. If A is skew-symmetric, then e4 is orthogonal (9.5.8). So the exponential 
maps L to G. Conversely, suppose that A is near 0. Then, denoting transpose by *, A* and 
-A are also near zero, and e4” and e~“ are near to J. If e4 is orthogonal, ie., if e4” = e74, 
Lemma (9.7.5) tells us that A* = -A, so A is skew-symmetric. Therefore a matrix A near 0 
is in L if and only if e4 is in G. This shows that the exponential defines a homeomorphism 
from a neighborhood V of 0 in L to a neighborhood U of J in G. Since L is a vector space, 
it is a manifold. The condition for a manifold is satisfied by the orthogonal group at the 
identity. Homogeneity implies that it is satisfied at all points. Therefore G is a manifold, and 


its dimension is the same as that of L, namely in(n —1). 0D 


Here is another application of the principle of homogeneity. 


Proposition 9.7.9 Let G be a path-connected matrix group, and let H be a subgroup of G 
that contains a nonempty open subset U of G. Then H = G. 


Proof. A subset of R” is path connected if any two points of S can be joined by a continuous 
path lying entirely in S (see [Munkres, p. 155] or Chapter 2, Exercise M.6). 

Since left multiplication by an element g is a homeomorphism from G to G, the set 
gU isalso open, and it is contained in a single coset of H, namely in gH. Since the translates 
of U cover G, the ones contained in a coset C cover that coset. So each coset is a union 
of open subsets of G, and therefore is open itself. Then G is partitioned into open subsets, 
the cosets of H. A path-connected set is not a disjoint union of proper open subsets (see 
[Munkres, p. 155]). Thus there can be only one coset, and H = G. O 


We use this proposition to determine the normal subgroups of SU>. 


Theorem 9.7.10 


(a) The only proper normal subgroup of SU? is its center {+ /}. 
(b) The rotation group SO3 is a simple group. 


Proof. (a) Let N be anormal subgroup of SU2 that contains an element P4 + /. We must 
show that N is equal to SU. Since N is normal, it contains the conjugacy class C of P, which 
is a latitude, a 2-sphere. 

We choose a continuous map P(t) from the unit interval [0, 1] to C such that P(O) = P 
and P(1)#P, and we form the path Q(4) = P(t)P"!. Then Q(0) = J, and Q(1) #1, so this 
path leads out from the identity /, as in the figure below. Since N is a group that contains 
P and P(t), it also contains Q(t) for every ¢ in the interval [0,1]. We don’t need to know 
anything else about the path Q(¢). 

We note that trace Q < 2 for any Q in SU, and that / is the only matrix with trace equal 
to 2. Therefore trace Q(0) = 2 and trace Q(1) = t < 2. By continuity, all values between t 
and 2 are taken on by trace Q(2). Since N is normal, it contains the conjugacy class of Q(t) 
for every t. Therefore N contains all elements of SU whose traces are sufficiently near to 2, 
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and this includes all matrices near to the identity. So N contains an open neighborhood of 
the identity in SU2. Since SU> is path-connected, Proposition 9.7.9 shows that N = SU. 


(b) There is a surjective map g: SU2 — SO3 whose kernel is {+/} (9.4.1). By the 
Correspondence Theorem 2.10.5, the inverse image of anormal subgroup in SO3 is a normal 
subgroup of SU that contains { + /}. Part (a) tells us that there are no proper subgroups of 
SU? except {+ J}, so SO3 contains no proper normal subgroup at all. 


One can apply translation in a group G to tangent vectors too. If A is a tangent vector 
at the identity and if P is an element of G, the vector PA is tangent to G at P, and if A isn’t 
zero, neither is PA. As P ranges over the group, the family of these vectors forms what is 
called a tangent vector field. Now just the existence of a continuous tangent vector field that is 
nowhere zero puts strong restrictions on the space G. It is a theorem of topology, sometimes 
called the ‘‘Hairy Ball Theorem,” that any tangent vector field on the 2-sphere must vanish 
at some point (see [Milnor]). This is one reason that the 2-sphere has no group structure. 
But since the 3-sphere is a group, it has tangent vector fields that are nowhere zero. 


9.8 NORMAL SUBGROUPS OF SL2 


Let F be a field. The center of the group SL2(F) is {+/}. (This is Exercise 8.5.) The quotient 
group SL2(F)/{+]} is called the projective group, and is denoted by PSL2(F). Its elements 
are the cosets { + P}. 


Theorem 9.8.1 Let F be a field of order at least four. 


(a) The only proper normal subgroup of SL2(F) is itscenter Z = (+ J}. 
(b) The projective group PSL2(F) is a simple group. 


Part (b) of the theorem follows from (a) and the Correspondence Theorem 2.10.5, 
and it identifies an interesting class of finite simple groups: the projective groups PSL2(F) 
when F is a finite field. The other finite, nonabelian simple groups that we have seen are the 
alternating groups (7.5.4). 

We will show in Chapter 15 that the order of a finite field is always a power of a 
prime, that for every prime power g = p*®, there is a field Fy of order g, and that Fy has 
characteristic p (Theorem 15.7.3). Finite fields of order 2° have characteristic 2. In those 
fields, 1 = -1 and J = -/. Then the center of SL2(FQ) is the trivial group. Let’s assume these 
facts for now. 
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We omit the proof of the next lemma. (See Chapter 3, Exercise 4.4 for the case that g 
is a prime.) 


Lemma 9.8.2 Let g be a power ofa prime. The order of SL2(Fq) is gq —q.lfqisnota power 
of 2, the order of PSL2(Fg) is 3(¢° - q). If q is a power of 2, then PSL2(Fq) + SL2(Fq), 
and the order of PSL2(FQ) is q —q. Oo 


The orders of PSL? for small g are listed below, along with the orders of the first three 
simple alternating groups. 


|F| 4 =5 7 8 9 11 13 16 17 19 
[PSL2| 60 60 168 504 360 660 1092 4080 2448 3420 


n 5 6 7 
([An| 60 360 2520 


The orders of the ten smallest nonabelian simple groups appear in this list. The next smallest 
would be PSL3(F3), which has order 5616. 

The projective group is not simple when |F'| = 2 or 3. PSL2(F2) is isomorphic to the 
symmetric group S3 and PSL2(F3) is isomorphic to the alternating group A4. 

As shown in these tables, PSL2(F4), PSL2(Fs5), and As have order 60. These three 
groups happen to be isomorphic. (This is Exercise 8.3.) The other coincidences among orders 
are the groups PSL2(F9) and Ag, which have order 360. They are isomorphic too. Oo 


For the proof, we will leave the cases | F'| = 4 and S aside, so that we can use the next 
lemma. 


Lemma 9.8.3 A field F of order greater than 5 contains an element r whose square is not 
0, 1, or-1. 


Proof. The only element with square 0is 0, and the elements with square 1 are +1. There 
are at most two elements whose squares are ~1: If a? = b* = -1, then (a— b)(a +b) =0,s0 
b= +a. 


Proof of Theorem 9.8.1. We assume given the field F, we let SL2 and PSL2 stand for 
SL2(F) and PSL2(F), respectively, and we denote the space F* by V. We choose a nonzero 
element r of F whose square s isnot +1. 

Let N be a normal subgroup of SZ2 that contains an element A# +7. We must show 
that N is the whole group SL3. Since A is arbitrary, it is hard to work with directly. The 
strategy is to begin by showing that NV contains a matrix that has eigenvalue s. 


Step I: There is a matrix P in SL such that the commutator B = APA™!P~' is in N, and has 
eigenvalues s ands}. 


This is a nice trick. We choose a vector v in V that is not an eigenvector of A and we 
let v2 = Av;. Then v, and v2 are independent, so B = (2, v2) is a basis of V. (It is easy to 
check that the only matrices in SL for which every vector isan eigenvector are J and -J.) 

Let R be the diagonal matrix with diagonal entries r and r"!. The matrix P = [B]R[B]! 
has determinant 1, and vj and v» are eigenvectors, with eigenvalues r and r“!, respectively 
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(4.6.10). Because N is a normal subgroup, the commutator B = APA~!P™! is an element of 
N (see (7.5.4)). Then 


Buy = APA Pv) = APA '(rv2) = rAPv, = PAY = SU. 


Therefore s is an eigenvalue of B. Because det B = 1, the other eigenvalue is s™!. 
Step 2: The matrices having eigenvalues s and s~! form a single conjugacy class C in SL2, 
and this conjugacy class is contained in N. 


The elements s and s~! are distinct because s# +1. Let Sbea diagonal matrix with 
diagonal entries s and s“!. Every matrix Q with eigenvalues s and s“! is a conjugate of S in 
GL2(F) (4.4.8)(b), say Q = LSL™!. Since S is diagonal, it commutes with any other diagonal 
matrix. We can multiply L on the right by a suitable diagonal matrix, to make det L = 1, 
while preserving the equation Q = LSL™. So Q isa conjugate of S in SL2. This shows that 
the matrices with eigenvalues s and s~! form a single conjugacy class. By Step 1, the normal 
subgroup N contains one such matrix. SoC C N. 


Step 3: The elementary matrices E = : ‘ll and Et = Ee Al with x in F, are in N. 


For any element x of F,, the terms on the left side of the equation 


S07) (oss Asx. | ica Sele 5 
Os) 1D OS Lh) 
are in C and in N, so Eis in N. One sees similarly that E' is in N. 


Step 4: The matrices FE and E’, with x in F, generate SL2. Therefore N = SL. 


The proof of this is Exercise 4.8 of Chapter 2 . 0 


As is shown by the alternating groups and the projective groups, simple groups arise 
frequently, and this is one of the reasons that they have been studied intensively. On the 
other hand, simplicity is a very strong restriction on a group. There couldn’t be too many of 
them. A famous theorem of Cartan is one manifestation of this. 

A complex algebraic group is a subgroup of the complex general linear group G Ly, (C) 
which is the locus of complex solutions of a finite system of complex polynomial equations 
in the matrix entries. Cartan’s theorem lists the simple complex algebraic groups. In the 
statement of the theorem, we use the symbol Z to denote the center of a group. 


Theorem 9.8.4 


(a) The centers of the groups SLy,(C), SOn(C), and SP, (C) are finite cyclic groups. 


(b) For n > 1, the groups SL,(C)/Z, SOn(C)/Z, and SP2,(C)/Z are path-connected 
complex algebraic groups. Except for SO2(C)/Z and SO4(C)/Z, they are simple. 
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(c) In addition to the isomorphism classes of these groups, there are exactly five isomorphism 
classes of simple, path-connected complex algebraic groups, called the exceptional 
groups. 


Theorem 9.8.4 is based on a classification of the corresponding Lie algebras. It is too hard to 
prove here. 


A large project, the classification of the finite simple groups, was completed in 1980. 
The finite simple groups we have seen are the groups of prime order, the alternating groups 
A, with n > 5, and the groups PSL2(F) when F is a finite field of order at least 4. Matrix 
groups play a dominant role in the classification of the finite simple groups too. Each of the 
forms (9.8.4) leads to a whole series of finite simple groups when finite fields are substituted 
for the complex field. There are also some finite simple groups analogous to the unitary 
groups. All of these finite linear groups are said to be of Lie type. In addition to the groups 
of prime order, the alternating groups, and the groups of Lie type, there are 26 finite simple 
groups called the sporadic groups. The smallest sporadic group is the Mathieu group M1, 
whose order is 7920. The largest, the Monster, has order roughly 10°. 


it seems unfair to crow about the successes of a theory 
and to sweep all its failures under the rug. 


—Richard Brauer 


EXERCISES 


Section 1 The Classical Linear Groups 
1.1. (a) Is GL, (C) isomorphic to a subgroup of G L2,(R)? 
(b) Is SO2(C) a bounded subset of C227 


1.2. A matrix P is orthogonal if and only if its columns form an orthonormal basis. Describe 
the properties of the columns of a matrix in the Lorentz group O3,. 


1.3. Prove that there is no continuous isomorphism from the orthogonal group O4 to the 
Lorentz group O31. 


1.4. Describe by equations the group QO; and show that it has four path-connected 
components. 


1.5. Prove that SP; = SL3, but that SPy 4 SL. 
1.6. Prove that the following matrices are symplectic, if the blocks are n Xn: 


t 
P | ; ki eal 5 k 7 Jesh B= B'and A is invertible. 


*1.7. Prove that 


(a) the symplectic group SP, operates transitively on R2”, 
(b) SP», is path-connected, (c) symplectic matrices have determinant 1. 


284 Chapter 9 Linear Groups 


Section 2 Interlude: Spheres 


2.1. 
2.2. 


2.3. 


Compute the formula for the inverse of the stereographic projection 2:S> > R?. 


One can parametrize proper subspaces of R* by a circle in two ways. First, if a subspace 
W intersects the horizontal axis with angle 0, one can use the double angle a = 20. The 
double angle eliminates the ambiguity between 6 and 8+ 7. Or, one can choose a nonzero 
vector (y1, y2) in W, and use the inverse of stereographic projection to map the slope 
X = y2/y, to a point of S!. Compare these two parametrizations. 


(unit vectors and subspaces in C2) A proper subspace W of the vector space C? has 
dimension 1. Its slope is defined to be A = y2/y1, where (yj, y2) is a nonzero vector in 
W. The slope can be any complex number, or when yj =0,A =00. | 


(a) Let z = vy + v2i. Write the formula for sterographic projection 2 (9.2.2) and its 
inverse function o in terms of z. 

(b) The function that sends a unit vector (1, y2) to o(y2/y,) defines a map form the 
unit sphere S? in C? to the two-sphere S*. This map can be used to parametrize 
subspaces by points of S*. Compute the function o(-y2/ 1) on unit vectors (y;, y2). 

(c) What pairs of points of S* correspond to pairs of subspaces W and W’ that are 
orthogonal with respect to the standard Hermitian form on C2? 


Section 3 The Special Unitary Group SU 


3.1, 


3.2. 
3.3. 
3.4, 


Let P and Q be elements of SU2, represented by the real vectors (xo, x1, x2, x3) 
and (yo, ¥1, ¥2, 3), respectively. Compute the real vector that corresponds to the 
product PQ. 


Prove that U2 is homeomorphic to the product S? x $!. 
Prove that every great circle in SU? (circle of radius 1) is a coset of one of the longitudes. 


Determine the centralizer of j in SU2. 


Section 4 The Rotation Group SO3 


41. 


4.2. 


4.3. 


4.4, 


4.5, 
4.6. 


Let W be the space of real skew-symmetric 3 X3 matrices. Describe the orbits for the 
operation P+ A = PAP! of SO3 on W. 


The rotation group SO3 may be mapped to a 2-sphere by sending a rotation matrix to its 
first column. Describe the fibres of this map. 


Extend the orthogonal representation g: SUz — SO3 to a homomorphism 
®:U2 — SO3, and describe the kernel of P. 


(a) With notation as in (9.4.1), compute the matrix of the rotation yp, and show that its 
trace is 1 + 2cos 26. 


(b) Prove directly that the matrix is orthogonal. 
Prove that conjugation by an element of SU? rotates every latitude. 


Describe the conjugacy classes in $O3 in two ways: 


(a) Its elements operate on R? as rotations. Which rotations make up a conjugacy 
class? 


47. 


4.8. 


*4,9, 
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(b) The spin homomorphism SU2 — SO3 can be used to relate the conjugacy classes in 
the two groups. Do this. 

(c) The conjugacy classes in SU2 are spheres. Describe the conjugacy classes in SO3 
geometrically. Be careful. 


(a) Calculate left multiplication by a fixed matrix P in SU2 explicitly, in terms of the 
coordinate vector (x9, X1, X2, x3). Prove that it is given as multiplication by a 4x4 
orthogonal matrix Q. 

Prove that Q is orthogonal by a method similar to that used in describing the 
orthogonal representation: Express dot product of the vectors (xp, x1, X2, x3) and 
(XQ, XX, X4) that correspond to matrices P and P’ in SU2, in matrix terms. 


(b 


—_ 


Let W be the real vector space of Hermitian 2 X 2 matrices. 


(a) Prove that the rule P- A = PAP* defines an operation of SL2(C) on W. 

(b) Prove that the function (A, A’) = det(A + A’) — det A — det A’ is a bilinear form on 
W, and that its signature is (3, 1). 

(c) Use (a) and (b) to define a homomorphism g: SL2(C) — O31, whose kernel is {+/}. 


(a) Let H; be the subgroup of $O3 of rotations about the x;-axis, i = 1, 2, 3. Prove that 
every element of SO3 can be written as a product ABA’, where A and A’ are in Hy and 
B is in Hp. Prove that this representation is unique unless B = J. 

(b) Describe the double cosets H; Q Hj; geometrically (see Chapter 2, Exercise M.9). 


Section5 One-Parameter Groups 


5.1. 
5.2. 
5.3. 


5.4. 


5.5. 


5.6. 


5.7. 


5.8. 


Can the image of a one-parameter group in GL,, cross itself? 


Determine the one-parameter groups in U2. 


Describe by equations the images of the one-parameter groups in the group of real, 
invertible, 2 X 2 diagonal matrices, and make a drawing showing some of them in the 
plane. 


Find the conditions on a matrix A so that e’4 is a one-parameter groupin 


(a) the special unitary group SU, (b) the Lorentz group 031. 


Let G be the group of real matrices of the form if : | with x > 0. 


(a) Determine the matrices A such that e’4 is a one-parameter group in G. 
(b) Compute e’4 explicitly for the matrices in (a). 
(c) Make a drawing showing some one-parameter groups in the (x, y)-pfane. 


Let G be the subgroup of GL? of matrices iF 2, | with x > 0 and y arbitrary. 
Determine the conjugacy classes in G, and the matrices A such that e'4 is a one- 
parameter group in G. 


Determine the one-parameter groups in the group of invertible m Xn upper triangular 
matrices. 


Let g(t) = e’4 be a one-parameter group in a subgroup G of GL y. Prove that the cosets 
of its image are matrix solutions of the differential equation dX /dt = AX. 
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5.9. Let g:R* > GL,, be a one-parameter group. Prove that kerg is either trivial, or an 
infinite cyclic group, or the whole group. 


5.10. Determine the differentiable homomorphisms from the circle group SO2 to GLy. 


Section 6 The Lie Algebra 


6.1. Verify the Jacobi identity for the bracket operation [A, B] = AB — BA. 


6.2. Let V be a real vector space of dimension 2, with a law of composition [v, w] that is 
bilinear and skew-symmetric (see (9.6.7)). Prove that the Jacobi identity holds. 


6.3. The group SL operates by conjugation on the space of trace-zero matrices. Decompose 
this space into orbits. 


6.4. Let G be the group of invertible real matrices of the form [° 2 Determine the Lie 


algebra L of G, and compute the bracket on L. 

6.5. Show that the set defined by x y = lisa subgroup of the group of invertible diagonal 2x2 
matrices, and compute its Lie algebra. 

6.6. (a) Show that O2 operates by conjugation on its Lie algebra. 
(b) Show that this operation is compatible with the bilinear form (A, B) = 5 trace AB. 
(c) Use the operation to define a homomorphism O2 — Oz, and describe this homo- 

morphism explicitly. 

6.7. Determine the Lie algebras of the following groups. 
(a) Un, (b) SUn, (€) O31, (d) SOn(C). 

A\B 

C| Dy)’ 

6.9. (a) Show that the vector cross product makes R? into a Lie algebra L}. 


(b) Let Lz = Lie(SU2), and let L3 = Lie(SO3). Prove that the three Lie algebras 
L,, £2 and L3 are isomorphic. 


6.8. Determine the Lie algebra of SP2,, using block form M = 


6.10. Classify complex Lie algebras of dimension < 3. 


6.11. Let B be a real n Xn matrix, and let ( , ) be the bilinear form X‘BY. The orthogonal 
group G of this form is defined to be the group of matrices P such that P'BP = B. 
Determine the one-parameter groups in G, and the Lie algebra of G. 


Section 7 Translation in a Group 


7.1. Prove that the unitary group U,, is path connected. 
7.2. Determine the dimensions of the following groups: 
(a) Un, (b) SUn, (©) SOn(C), (a) O31, (€) SPan. 
7.3. Using the exponential, find all solutions near J of the equation P? = I. 
7.4. Find a path-connected, nonabelian subgroup of GL of dimension 2. 
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*7.5, (a) Prove that the exponential map defines a bijection between the set of all Hermitian 
matrices and the set of positive definite Hermitian matrices. 
(b) Describe the topological structure of GL2(C) using the Polar decomposition 
(Chapter 8, Exercise M.8) and (a). 
7.6. Sketch the tangent vector field PA to the group C*, when A = 1 +i. 
7.7. Let H bea finite normal subgroup ofa path connected group G. Prove that H is contained 
in the center of G. 


Section 8 Normal Subgroups of SL 


8.1. Prove Theorem 9.8.1 for the cases F = F4 and Fs. 

8.2. Describe isomorphisms PSL2(F2) ~ $3 and PSL2(F3) ~ Aq. 

8.3. (a) Determine the numbers of Sylow p-subgroups of PSL (Fs), for p = 2, 3, 5. 
(b) Prove that the three groups As, PSL2(F4), and PSL2(Fs) are isomorphic. 

8.4. (a) Write the polynomial equations that define the symplectic group. 


(b) Show that the unitary group U,, can be defined by real polynomial equations in the 
real and imaginary parts of the matrix entries. 


8.5. Determine the centers of the groups SL, (R) and SLy,(C). 
8.6. Determine all normal subgroups of G L2 (R) that contain its center. 
8.7. With Z denoting the center of a group, is PSL,(C) isomorphic to GL,(C)/Z? Is 
PSL,(R) isomorphic to GL, (R)/Z? 
8.8. (a) Let P be a matrix in the center of SO,, and let A be a skew-symmetric matrix. Prove 
that PA = AP. 
(b) Prove that the center of SO, is trivial ifn is odd and is {+ J} ifn is even andn > 4. 
8.9. Compute the orders of the groups 
(a) SO2(F3), (b) SO3(F3), (c) SO2(Fs), (d) SO3(Fs). 


*8.10. (a) Let V be the space V of complex 2 X 2 matrices, with the basis (e;1, €12, €21, €22). 


Write the matrix of conjugation by A = & | on V in block form. 


(b) Prove that conjugation defines a homomorphism g: SL2(C) > GL4(C), and that 
the image of ¢ is isomorphic to PSL2(C). 


(c) Prove that PSL2(C) is a complex algebraic group by finding polynomial equations 
in the entries y;; of a 4 <4 matrix whose solutions are the matrices in the image of ¢. 


Miscellaneous Exercises 
M.1. Let G = SL2(R), let A = ; - be a matrix in G, and let ¢ be its trace. Substituting 


t—x for w, the condition det A = 1 becomes x(t — x) — yz = 1. For fixed trace ¢, the 
locus of solutions of this equation is a quadric in x, y, z-space. Describe the quadrics that 
arise this way, and decompose them into conjugacy classes. 


*M.2. Which elements of SL2(R) lie on a one-parameter group? 
M.3. Are the conjugacy classes in a path connected group G path connected? 
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M.4. 


M.S. 


M.6. 


M.7. 


M8. 


M.9. 
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Quaternions are expressions of the form aw = a + bi+ cj + dk, where a, b, c, d are real 
numbers (see (9.3.3)). 


(a) Let @ = a — bi— cj — dk. Compute aa. 
(b) Prove that every ~#0 has a multiplicative inverse. 


(c) Prove that the set of quaternions @ such that a? + b? + c? + d2 = 1 forms a group 
under multiplication that is isomorphic to SU2. 


The affine group A, is the group of transformations of R” generated by GL, and the 
group 7, of translations: tg(x) = x + a. Prove that 7;, is anormal subgroup of A, and 
that A,/T, is isomorphic to GLyp. 


(Cayley transform) Let U denote the set of matrices A such that J + A is invertible, and 
define A’ = 7 — A)U+A)7}. 
(a) Prove that if A isin U, then so is A’, and that (A’)’ = A. 


(b) Let V denote the vector space of real skew-symmetric n Xn matrices. Prove that the 
rule A ~» (I — A)(I+.A)~! defines a homeomorphism from a neighborhood of 0 in 
V to a neighborhood of J in SO, . 


(c) Is there an analogous statement for the unitary group? 


(d) LetS = BE : | Show that a matrix A in U is symplecticif and onlyif (A’)'S = -SA’. 


Let G = SL. A ray in R? isa half line leading from the origin to infinity. The rays are in 
bijective correspondence with the points on the unit 1-sphere in R?. 


(a) Determine the stabilizer H of the ray {re,|r > 0}. 


(b) Prove that the map f: HX SO, — G defined by f(P, B) = PB is ahomeomorphism 
(not a homomorphism). 


(c) Use (b) to identify the topological structure of SZ2. 


Two-dimensional space-time is the space of real three-dimensional column vectors, with 
the Lorentz form (Y, Y’) = ¥¥2,1Y’ = yiy, + yay — Y3y3- 
The space W of real trace-zero 2 X 2 matrices has a basis B = (w 1, w2, w3), where 


of akee[ feb 


(a) Show that if A = BY and A’ = BY’ are trace-zero matrices, the Lorentz form carries 
over to (A, A’) = yy, + Y2¥9 — Y33 = 5 trace(AA’). 
(b) The group SZ2 operates by conjugation on the space W. Use this operation to define 
a homomorphism g: SL2 —» O2,; whose kernel is { + /}. 
*(c) Prove that the Lorentz group Q2,, has four connected components and that the 
image of ¢ is the component that contains the identity. 


The icosahedral group is a subgroup of index 2 in the group Gj, of all symmetries of 
a dodecahedron, including orientation-reversing symmetries. The alternating group As 
is a subgroup of index 2 of the symmetric group G2 = Ss. Finally, consider the spin 
homomorphism g: SU2 + SO3. Let G3 be the inverse image of the icosahedral group in 
SU}. Are any of the groups G; isomorphic? 


*M.10. 


*M.11. 


*M.12. 


*M.13. 
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Let P be the matrix (9.3.1) in SU, and let T denote the subgroup of SU of diagonal 
matrices. Prove that if the entries a, b of P are not zero, then the double coset T PT 
is homeomorphic to a torus, and describe the remaining double cosets (see Chapter 2, 
Exercise M.9). 


The adjoint representation of a linear group G is the representation by conjugation on its 
Lie algebra: G x L > L defined by P, A~» PAP™!. The form (A, A’) = trace(AA’) on 
L is called the Killing form. For the following groups, verify that if P isin G and A is in 
L, then PAP”! is in L. Prove that the Killing form is symmetric and bilinear and that the 
operation is compatible with the form, i.e., that (A, A) = (PAP™!, PA’P7}). 

(a) Un, (b) 03,1, (©) SOn(C), (d) SPan. 
Determine the signature of the Killing form (Exercise M.11) on the Lie algebra of 

(a) SU;, (b) SOn, (c) SLn. 
Use the adjoint representation of SZ2(C) (Exercise M.11) to define an isomorphism 
SL2(C)/{+ 1} = $O3(C). 


CHAPTER _ 10 


Group Representations 


A tremendous effort has been made by mathematicians 
for more than a century to clear up the chaos in group theory. 
Still, we cannot answer some of the simplest questions. 


—Richard Brauer 


Group representations arise in mathematics and in other sciences when a structure with 
symmetry is being studied. If one makes all possible measurements of some sort (in 
chemistry, it might be vibrations of a molecule) and assembles the results into a ‘‘state 
vector,” a symmetry of the molecule will transform that vector. This produces an operation 
of the symmetry group on the space of vectors, a representation of the group, that can help 
to analyze the structure. 


10.1 DEFINITIONS 


In this chapter, GL, denotes the complex general linear group GLp,(C). 
A matrix representation of a group G is a homomorphism 


(10.1.1) R:G > GLy, 


from G to one of the complex general linear groups. The number n is the dimension of the 
representation. 

We use the notation Ry instead of R(g) for the image of a group element g. Each Ry 
is an invertible matrix, and the statement that R is a homomorphism reads 


(10.1.2) Rgh = RgRp. 
If a group is given by generators and relations, say (X1,...,%n|r1,...,/x), a matrix 
representation can be defined by assigning matrices Rx,,..., Rx, that satisfy the relations. 


For example, the symmetric group S3 can be presented as (x, y|x3, y*, xyxy), so a 
representation of S3 is dened by matrices Ry and Ry such that R3 = 7, Ry = J, and 
R,RyR,Ry = I. Some relations in addition to these required ones may hold. 

Because S3 is isomorphic to the dihedral group D3, it has a two-dimensional matrix 
representation that we denote by A. We place an equilateral triangle with its center at 
the origin, and so that one vertex is on the e)-axis. Then its group of symmetries will be 
generated by the rotation A, with angle 27/3 and the reflection Ay about the e;-axis. With 
c = cos27/3 and s = sin27/3, 
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(10.1.3) ie I ea ces ke a 


We call this the standard representation of the dihedral group D3 and of 53. 


* A representation R is faithful if the homomorphism R:G — GLy is injective, and there- 
fore maps G isomorphically to its image, a subgroup of GL,y. The standard representation 
of S3 is faithful. 


Our second representation of $3 is the one-dimensional sign representation X. Its value 
on a group element is the 1 X 1 matrix whose entry is the sign of the permutation: 


(10.1.4) Se Sells, Sy fay: 


This is not a faithful representation. 
Finally, every group has the trivial representation, the one-dimensional representation 
that takes the value 1 identically: 


(10.1.5) Tx, =(1), Ty=[]). 


There are other representations of $3, including the representation by permutation 
matrices and the representation as a group of rotations of R>. But we shall see that every 
representation of this group can be built up out of the three representations A, X, and T. 


Because they involve several matrices, each of which may have many entries, repre- 
sentations are notationally complicated. The secret to understanding them is to throw out 
most of the information that the matrices contain, keeping only one essential part, its trace, 
or character. 


e The character xr of a matrix representation R is the complex-valued function whose 
domain is the group G, defined by xr(g) = trace Rg. 


Characters are usually denoted by x (‘chi’). The characters of the three representations 
of the symmetric group that we have defined are displayed below in tabular form, with the 
group elements listed in their usual order. 


1 x x* y xy xy 
Trt ot tt ak ot 
(10.1.6) lia ard wel She et 
XA 2 -1 -1 0 O 0 


Several interesting phenomena can be observed in this table: 


¢ The rows form orthogonal vectors of length equal to six, which is also the order of $3. The 
columns are orthogonal too. 


These astonishing facts illustrate the beautiful Main Theorem 10.4.6 on characters. 
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Two other phenomena are more elementary: 
* xx(1) is the dimension of the representation, also called the dimension of the character. 


Since a representation is a homomorphism, it sends the identity in the group to the identity 
matrix. So xR(1) is the trace of the identity matrix. 


e The characters are constant on conjugacy classes. 


(The conjugacy classes in S3 are the sets {1}, {x, x?}, and {y, xy, x”y}.) 

This phenomenon is explained as follows: Let g and g’ be conjugate elements of a 
group G, say g’ = hgh™!, Because a representation R is a homomorphism, Rg = RnR hy 
So Ry and Rg are conjugate matrices. Conjugate matrices have the same trace. 


It is essential to work as much as possible without fixing a basis, and to facilitate this, 
we introduce the concept of a representation of a group on a vector space V. We denote by 


(10.1.7) GL(V) 


the group of invertible linear operators on V, the law of composition being composition of 
operators. We always assume that V is a finite-dimensional complex vector space, and not 
the zero space. 


e A representation of a group G on acomplex vector space V is a homomorphism 
(10.1.8) p:G—>GL(V). 


Soa representation assigns a linear operator to every group element. A matrix representation 
can be thought of as a representation of G on the space of column vectors. 

The elements of a finite rotation group (6.12) are rotations of a three-dimensional 
Euclidean space V without reference to a basis, and these orthogonal operators give us what 
we call the standard representation of the group. (We use this term in spite of the fact that, 
for D3, it conflicts with (10.1.3).) We also use the symbol p for other representations, and 
this will not imply that the operators (g are rotations. 


If o is a representation, we denote the image of an element g in GL(V) by g rather 
than by p(g), to keep the symbol g out of the way. The result of applying og to a vector v 
will be written as 


Pg(v) oras Pg. 
Since pis a homomorphism, 


(10.1.9) Pgh = PgPh- 


The choice of a basis B = (v1, ..., Un) for a vector space V defines an isomorphism 
from GL(V) to the general linear group GL,: 


GL(V) > GL, 


(10.1.10) T ~ matrix of 7, 


and a representation po defines a matrix representation R, by the rule 


(10.1.11) Pg ~> its matrix = Ry. 
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Thus every representation of G on a finite-dimensional vector space can be made into a 
matrix representation, if we are willing to choose a basis. We may want to choose a basis in 
order to make explicit calculations, but we must determine which properties are independent 
of the basis, and which bases are the good ones. 

A change of basis in V by a matrix P changes the matrix representation R associated 
to ¢ to a conjugate representation R’ = P™' RP, i.e., 


(10.1.12) RSP RES 
with the same P for every g in G. This follows from Rule 4.3.5 for a change of basis. 


e An operation of a group G by linear operators on a vector space V is an operation on the 
underlying set: 


(10.1.13) lvu=v and (gh)v= g(hv), 


and in addition every group element acts as a linear operator. Writing out what this means, 
we obtain the rules 


(10.1.14) g(v+uv)=gv+gv and g(cv) =cgv, 


which, when added to (10.1.13), give a complete list of axioms for such an operation. We can 
speak of orbits and stabilizers as before. 

The two concepts ‘‘operation by linear operators on V” and “representation on V”’ 
are equivalent. Given a representation p of G on V, we can define an operation of G on 
V by 


(10.1.15) BV = Pg(V). 


Conversely, given an operation, the same formula can be used to define the operator pg. 

We now have two notations (10.1.15) for the operation of g on v, and we use them 
interchangeably. The notation gv is more compact, so we use it when possible, though it is 
ambiguous because it doesn’t specify p. 


e An isomorphism from one representation 9:G — GL(V) of a group G to another 
representation op’: G — GL/(V’) is an isomorphism of vector spaces 7: V -> V’, an 
invertible linear transformation, that is compatible with the operations of G: 


(10.1.16) T(gv) = gT(v) 


for all v in V and all g in G. If T: V > V’ is an isomorphism, and if B and B’ are 
corresponding bases of V and V’, the associated matrix representations Ry and Re will be 
equal. 


The main topic of the chapter is the determination of the isomorphism classes 
of complex representations of a group G, representations on finite-dimensional, nonzero 
complex vector spaces. Any real matrix representation, such as one of the representations of 
S3 described above, can be used to define a complex representation, simply by interpreting 
the real matrices as complex matrices. We will do this without further comment. And except 
in the last section, our groups will be finite. 
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10.2 IRREDUCIBLE REPRESENTATIONS 


Let ¢ be a representation of a finite group G on the (nonzero, finite-dimensional) complex 
vector space V. A vector v is G-invariant if the operation of every group element fixes the 
vector: ; 


(10.2.1) gv=v or Pg(v) =v, forall gin G. 


Most vectors aren’t G-invariant. However, starting with any vector v, one can produce a 
G-invariant vector by averaging over the group. Averaging is an important procedure that 
will be used often. We used it once before, in Chapter 6, to find a fixed point of a finite group 
operation on the plane. The G-invariant averaged vector is 


(10.2.2) v= a Dd. gv. 
geG 


The reason for the normalization factor rel is that, if v happens to be G-invariant itself, then 
v=v. 

We verify that v is G-invariant: Since the symbol g is used in the summation (10.2.2), 
we write the condition for G-invariance as hv = v for all h in G. The proof is based on 
the fact that left multiplication by h defines a bijective map from G to itself. We make the 
substitution g’ = hg. Then as g runs through the elements of the group G, g’ does too, 
though in a different order, and 


(10.2.3) hd =hey >) v= > 8 =a Devs o. 
eG geG eeG 


This reasoning can be confusing when one sees it for the first time, so we illustrate it 
by an example, with G = S3. We list the elements of the group as usual: g = 1, x, x?, 
y, xy, x*y. Leth = y. Then g’ = hg lists the group in the order g’ = y, x*y, xy, 1, x”, x. So 


» glv=yut+x*yu+xyut+lv+xvt+xve De gv 
geG geG 
The fact that multiplication by A is bijective implies that g’ will always run over the group in 


some order. Please study this reindexing trick. 
The averaging process may fail to yield an interesting vector. It is possible that v = 0. 


Next, we turn to G-invariant subspaces. 


« Let p be arepresentation of G on V. A subspace W of V is called G-invariant if gw is in 
W for all w in W and g in G. So the operation by a group element must carry W to itself: 
For all g, 


(10.2.4) gwCcw, or pgWCw. 


This is an extension of the concept of T-invariant subspace that was introduced in Section 4.4. 
Here we ask that W be an invariant subspace for each of the operators (g. 

When W is G-invariant, we can restrict the operation of G to obtain a representation 
of G on W. 


Lemma 10.2.5 If W is an invariant subspace of V, then gW = W for all gin G. 
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Proof. Since group elements are invertible, their operations on V are invertible. So gW and 
W have the same dimension. If gW C W, then gW = W. Oo 


e If V is the direct sum of G-invariant subspaces, say V = W) ® W2, the representation e 
on V is called the direct sum of its restrictions to W, and W2, and we write 


(10.2.6) p=ae PB, 


where o@ and B denote the restrictions of o to W; and W, respectively. Suppose that this 
is the case, and let B = (Bj, Bz) be a basis of V obtained by listing bases of W; and W) in 
succession. Then the matrix of Og will have the block form 


(10.2.7) = Se iF 


where Ag is the matrix of aw and By is the matrix of 8, with respect to the chosen bases. The 
zeros below the block Ag reflect the fact that the operation of g does not spill vectors out of 
the subspace Wj, and the zeros above the block Bg reflect the analogous fact for W2. 

Conversely, if R is a matrix representation and if all of the matrices Rg have a block 
form (10.2.7), with Ag and Bg square, we say that the matrix representation R is the direct 
sum A © B. 

For example, since the symmetric group S3 is isomorphic to the dihedral group D3, 
it is a rotation group, a subgroup of SO3. We choose coordinates so that x acts on R? as a 
rotation with angle 27/3 about the e3-axis, and y acts as a rotation by 7 about the e,-axis. 
This gives us a three-dimensional matrix representation M: 


c -s 1 
(10.2.8) M,y=|s c » My= -1 ‘ 
1 -1 


with c = cos27/3 and s = sin 27/3. We see that M has a block decomposition, and that it is 
the direct sum A ® & of the standard representation and the sign representation. 


Even when a representation ¢ is a direct sum, the matrix representation obtained 
using a basis will not have a block form unless the basis is compatible with the direct sum 
decomposition. Until we have made a further analysis, it may be difficult to tell that a 
representation is a direct sum, when it is presented using the wrong basis. But if we find such 
a decomposition of our representation p, we may try to decompose the summands @ and 6 
further, and we may continue until no further decomposition is possible. 


¢ If pis arepresentation of a group G on V and if V has no proper G-invariant subspace, p 
is called an irreducible representation. If V has a proper G-invariant subspace, ¢ is reducible. 


The standard representation of 3 is irreducible. 

Suppose that our representation ¢ is reducible, and let W be a proper G-invariant 
subspace of V. Let @ be the restriction of p to W. We extend a basis of W to a basis of V, 
say B= (w1,..., Wk; Vga, --- Ug). The matrix of Pg will have the block form 


_|4eg * 
(10.2.9) Re=(6 at 
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where A is the matrix of w and Bg is some other matrix representation of G. I think of the 
block indicated by « as ‘‘junk.’’ Maschke’s theorem, which is below, tells us that we can get 
rid of that junk. But to do so we must choose the basis more carefully. 


Theorem 10.2.10 Maschke’s Theorem. Every representation of a finite group G on a 
nonzero, finite-dimensional complex vector space is a direct sum of irreducible representa- 
tions. 


This theorem will be proved in the next section. We’ll illustrate it here by one more 
example in which G is the symmetric group S3. We consider the representation of $3 by the 
permutation matrices that correspond to the permutations x = (123) and y = (12). Let’s 
denote this representation by N: 


001 0 1 0 
(10.2.11) Nx=|1 0 0], Ny=l1 0 0 
01 0 504 


There is no block decomposition for this pair of matrices. However, the vector 
w, = (1,1, 1)! is fixed by both matrices, so it is G-invariant, and the one-dimensional 
subspace W spanned by w, is also G-invariant. The restriction of N to this subspace is the 
trivial representation 7. Let’s change the standard basis of C3 to the basis B = (w1, 2, €3). 
With respect to this new basis, the representation N is changed as follows: 


1 
1 
1 


or!lo 
nl) 


1; 1 0 
P'n,P=[0/0 -1|. P'NyP=[0]-1 0 

O}-1 1 
The upper right blocks aren’t zero, so we don’t have a decomposition of the representation 
as a direct sum. 

There is a better approach: The matrices N, and Ny are unitary, so Ng is unitary 
for all g in G. (They are orthogonal, but we are considering complex representations.) 
Unitary matrices preserve orthogonality. Since W is G-invariant, the orthogonal space W+ 
is G-invariant too (see (10.3.4)). If we form a basis by choosing vectors w2 and w3 from 
W1, the junk disappears. The permutation representation N is isomorphic to the direct sum 
T @ A. We'll soon have techniques that make verifying this extremely simple, so we won’t 
bother doing so here. 

This decomposition of the representation using orthogonal spaces illustrates a general 
method that we investigate next. 


10.3. UNITARY REPRESENTATIONS 


Let V be a Hermitian space - a complex vector space together with a positive definite 
Hermitian form (, ). A unitary operator T on V is a linear operator with the property 


(10.3.1) (Tv, Tw) = (v, w) 


for all v and w in V (8.6.3). If A is the matrix of a linear operator T with respect to an 
orthonormal basis, then J is unitary if and only if A is a unitary matrix: A* = A7!. 
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e A representation 0: G — GL(V) on a Hermitian space V is called unitary if pg is a 
unitary operator for every g. We can write this condition as 


(10.3.2) (gv, gw) =(v,wW) or (gv, Pgw) = (uv, w), 


for all v and w in V and all g in G. Similarly, a matrix representation R:G > GL, 
is unitary if Rg is a unitary matrix for every g in G. A unitary matrix representation is a 
homomorphism from G to the unitary group: 


(10.3.3) RE Us 


A representation o on a Hermitian space will be unitary if and only if the matrix represen- 
tation obtained using an orthonormal basis is unitary. 


Lemma 10.3.4 Let o be a unitary representation of G on a Hermitian space V, and let W be 
a G-invariant subspace. The orthogonal complement W+ is also G-invariant, and : is the 
direct sum of its restrictions to the Hermitian spaces W and W-. These restrictions are also 
unitary representations. 


Proof. Itis true that V = W ® W+ (8.5.1). Since p is unitary, it preserves orthogonality: If 
W is invariant and ul W, then gulgW = W. This means thatifu ¢ W+,then gu « W!.0 


The next corollary follows from the lemma by induction. 


Corollary 10.3.5 Every unitary representation 9:G — GL(V) ona Hermitian vector space 
V is an orthogonal sum of irreducible representations. Oo 


The trick now is to turn the condition (10.3.2) for a unitary representation around, and 
think of it as a condition on the form instead of on the representation. Suppose we are given 
a representation 9:G — GL/(V) on a vector space V, and let (, ) be a positive definite 
Hermitian form on V. We say that the form is G-invariant if (10.3.2) holds. This is exactly 
the same as saying that the representation is unitary, when we use the form to make V into 
a Hermitian space. But if only the representation ¢ is given, we are free to choose the form. 


Theorem 10.3.6 Let o:G — GL(V) bea representation of a finite group on a vector space 
V. There exists a G-invariant, positive definite Hermitian form on V. 


Proof. We begin with an arbitrary positive definite Hermitian form on V that we denote by 
{ ,}. For example, we may choose a basis for V and use it to transfer the standard Hermitian 
form X*Y on C” over to V. Then we use the averaging process to construct another f6rm. 
The averaged form is defined by 


(10.3.7) (v, w) = joy D_{gv, gu}. 
geG 


We claim that this form is Hermitian, positive definite, and G-invariant. The verifications of 
these properties are easy. We omit the first two, but we will verify G-invariance. The proof 
is almost identical to the one used to show that averaging produces an G-invariant vector 
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(10.2.3), except that it is based here on the fact that right multiplication by an element h of 
G defines a bijective map G > G. 

Let h be an element of G. We must show that (kv, hw) = (v, w) for all v and 
w in V (10.3.2). We make the substitution g’ = gh. As g runs over the group, so 
does g’. Then 


(hv, hw) = 7 ) Aghv, ghw) = rar Ll" g'w) = 1a Lise. gw} =(v,w). 9 
g 
Theorem 10.3.6 has remarkable Seis 


Corollary 10.3.8 


(a) (Maschke’s Theorem): Every representation of a finite group G is a direct sum of 
irreducible representations. 

(b) Let o:G > GL(V) be a representation of a finite group G on a vector space V. There 
exists a basis B of V such that the matrix representation R obtained from ¢ using this 
basis is unitary. 

(c) Let R:G > GLy be a matrix representation of a finite group G. There is an invertible 
matrix P such that Ri, = P"'RgP is unitary for all g, i.e., such that R’ is a homomorphism 
from G to the unitary group Uy. 

(d) Every finite subgroup of GL, is conjugate to a subgroup of the unitary group Up. 


Proof. (a) This follows from Theorem 10.3.6 and Corollary 10.3.5. 


(b) Given p, we choose a G-invariant positive definite Hermitian form on V, and we take 
for B an orthonormal basis with respect to this form. The associated matrix representation 
will be unitary. 


(c) This is the matrix form of (b), and it is derived in the usual way, by viewing R as a 
representation on the space C” and then changing basis. 


(d) This is obtained from (c) by viewing the inclusion of asubgroup H into GL, as a matrix 
representation of H. O 


This corollary provides another proof of Theorem 4.7.14: 
Corollary 10.3.9 Every matrix A of finite order in GL, (C) is diagonalizable. 


Proof. The matrix A generates a finite cyclic subgroup of GL,. By Theorem 10.3.8(d), this 
subgroup is conjugate to a subgroup of the unitary group. Hence A is conjugate to a unitary 
matrix. The Spectral Theorem 8.6.8 tells us that a unitary matrix is diagonalizable. Therefore 
A is diagonalizable. O 


10.4 CHARACTERS 


As mentioned in the first section, one works almost exclusively with characters, one reason 
being that representations are complicated. The character x of a representation ¢ is the 
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complex-valued function whose domain is the group G, defined by 
(10.4.1) X(g) = trace pg. 


If R is the matrix representation obtained from pe by a choice of basis, then x is also 
the character of R. The dimension of the vector space V is called the dimension of the ° 
representation , and also the dimension of its character x. The character of an irreducible 
representation is called an irreducible character. 


Here are some basic properties of the character. 


Proposition 10.4.2 Let x be the character of a representation ¢ of a finite group G. 


(a) x(1) is the dimension of x. 

(b) The character is constant on conjugacy classes: If g’ = hgh™!, then x(g’) = x(g). 

(c) Let g be an element of G of order k. The roots of the characteristic polynomial of og 
are powers of the k-th root of unity ¢ = e*”'/*, If o has dimension d, then x(g) is a sum 
of d such powers. 

(d) x(g~}) is the complex conjugate x(g) of x(g). 

(e) The character of a direct sum e © p’ of representations is the sum x + x’ of their 
characters. 


(f) Isomorphic representations have the same character. 
Proof. Parts (a) and (b) were discussed before, for matrix representations (see (10.1.6)). 


(c) The trace of Og is the sum of its eigenvalues. If A is an eigenvalue of p, then AF is an 
eigenvalue of pas and if g* = 1, then ph = JandAk =1.Sodisa power of ¢. 


(d) The eigenvalues A1,..., Ag of Rg have absolute value 1 because they are roots of 
unity. For any complex number A _of absolute value 1, A! = i. Therefore x(g4) = 
Ay te +aq) =A, +-+--+dAg = x(g). 


Parts (e) and (f) are obvious. O 


Two things simplify the computation of a character x. First, since x is constant on 
conjugacy classes, we need only determine the value of x on one element in each class - a 
representative element. Second, since trace is independent of a basis, we may select a 
convenient basis for each individual group element to compute it. We don’t need to use the 
same basis for all elements. 


There is a Hermitian product on characters, defined by 


(10.4.3) (G20)> Bp Ox): 
& 


When x and x’ are viewed as vectors, as in Table 10.1.6, this is the standard Hermitian 


product (8.3.3), scaled by the factor a 
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It is convenient to rewrite this formula by grouping the terms for each conjugacy class. 
This is permissible because the characters are constant on them. We number the conjugacy 
classes arbitrarily, as C},..., C,, and we let c; denote the order of the class Cj. We also 
choose a representative element g; in the class C;. Then 


r 
(10.4.4) (XX) = ey Doi xnx'(ei). 
i=1 
We go back to our usual example: Let G be the symmetric group S3. Its class equation 


is 6 = 1+2-+3, and the elements 1, x, y represent the conjugacy classes of orders 1, 2, 3, 
respectively. Then 


(x, X) = § (XD) + 2x@)x') + 3xO)X'O). 
Looking at Table 10.1.6, we find 


(10.4.5) (Xa, Xa) = § (44240) =1 and (Xa, xXx) =} (24-240) =0. 


The characters x7, Xz, Xa are orthonormal with respect to the Hermitian product (, ). 


These computations illustrate the Main Theorem on characters. It is one of the most 
beautiful theorems of algebra, both because it is so elegant, and because it simplifies the 
problem of classifying representations so much. 


Theorem 10.4.6 Main Theorem. Let G be a finite group. 


(a) (orthogonality relations) The irreducible characters of G are orthonormal: If x; is the 
character of an irreducible representation p;, then (Xi, xi) = 1. If x; and x; are the 
characters of nonisomorphic irreducible representations o; and ;, then (x;, xj) = 0. 

(b) There are finitely many isomorphism classes of irreducible representations, the same 
number as the number of conjugacy classes in the group. 

(c) Let o,,..., or represent the isomorphism classes of irreducible representations of G, 
and let x1,..., x, be their characters. The dimension d; of ¢; (or of x;) divides the 
order |G| of the group, and |G| = dt tree + a 


This theorem is proved in Section 10.8, except we won’t prove that d; divides |G]. 


One should compare (ce) with the class equation. Let the conjugacy classes be 
Ci,..., Cy, and let c; = |Cj|. Then c; divides |G|, and |G| = cy +--+ + ry. 


The Main Theorem allows us to decompose any character as a linear combination of 
the irreducible characters, using the formula for orthogonal projection (8.4.11). Maschke’s 
Theorem tells us that every representation ¢ is isomorphic to a direct sum of the irreducible 
representations (1, ..., Or. We write this symbolically as 


(10.4.7) pny, O---Onrpr, 


where nj; are non-negative integers, and m;; stands for the direct sum of n; copies of ¢;. 
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Corollary 10.4.8 Let o;,..., 0, represent the isomorphism classes of irreducible repre- 
sentations of a finite group G, and let p be any representation of G. Let x; and x be the 
characters of o; and p, respectively, and let n; = (x, xj). Then 


(a) X=mxX1 +--+ +MrXr, and 

(b) eis isomorphic ton; p; ®--- Pn; py. 

(c) Two representations o and p’ of a finite group G are isomorphic if and only if their 
characters are equal. 


Proof. Any representation ¢ is isomorphic to an integer combination m,p; © --- ® m;p, 
of the representations o;, and then x = mix; +--- + mrxr (Lemma 10.4.2). Since the 
characters x; are orthonormal, the projection formula shows that m; = n;. This proves (a) 
and (b), and (c) follows. O 


Corollary 10.4.9 For any characters x and x’, (x, x’) is an integer. oO 


Note also that, with x as in (10.4.8)(a), 
(10.4.10) (XxX) ante. nl, 


Some consequences of this formula are: 

}=1 © xis an irreducible character, 

)}=2 © xis the sum of two distinct irreducible characters, 
)}=3 © yx is the sum of three distinct irreducible characters, 
)=4 


= x is either the sum of four distinct irreducible characters, or 
xX = 2x; for some irreducible character x;. O 


A complex-valued function on the group, such as a character, that is constant on each 
conjugacy class, is called a class function. A class function g can be given by assigning 
arbitrary values to each conjugacy class. So the complex vector space H of class functions 
has dimension equal to the number of conjugacy classes. We use the same product as (10.4.3) 
to make H into a Hermitian space: 


OW=G 2 0) p(g)W(g). 


Corollary 10.4.11 The irreducible characters is on orthonormal basis of the space # of 
class functions. 


This follows from parts (a) and (b) of the Main Theorem. The characters are independent 
because they are orthonormal, and they span H because the dimension of H is equal to the 
number of conjugacy classes. O 


Using the Main Theorem, it becomes easy to see that T, X, and A represent all of the 
isomorphism classes of irreducible representations of the group 53 (see Section 10.1). Since 
there are three conjugacy classes, there are three irreducible representations. We verified 
above (10.4.5) that (x4, Xa) = 1, So A is an irreducible representation. The representations 
T and & are obviously irreducible because they are one-dimensional. And, these three 
representations are not isomorphic because their characters are distinct. 
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The irreducible characters of a group can be assembled into a table, the character 
table of the group. It is customary to list the values of the character on a conjugacy class 
just once. Table 10.1.6, showing the irreducible characters of 53, gets compressed into 
three columns. In the table below, the three conjugacy classes in $3 are described by the 
representative elements 1, x, y, and for reference, the orders of the conjugacy classes are 
given above them in parentheses. We have assigned indices to the irreducible characters: 


Xr = X1, XU = X2, and Xa = x3. 
conjugacy 
class 
(1) (2) (3) | order of the class 
1 x  y _— representative element 
irreducible x1 1 1 1 
character x2 1 1 -1 


value of the 
character 


(10.4.12) Character table of the symmetric group 53 


In such a table, we put the trivial character, the character of the trivial representation, 
into the top row. It consists entirely of 1’s. The first column lists the dimensions of the 


representations (10.4.2)(a). 


We determine the character table of the tetrahedral group T of 12 rotational symmetries 
of a tetrahedron next. Let x denote rotation by 277/3 about a face, and let z denote rotation 
by 2 about the center of an edge, as in Figure 7.10.8. The conjugacy classes are C(1), 
C(x), C(x’), and C(z), and their orders are 1, 4, 4, and 3, respectively. So there are four 
irreducible characters; let their dimensions be d;. Then 12 = ad; tere di. The only solution 
of this equation is 12 = 17 + 17 + 12 +32, so the dimensions of the irreducible representations 
are 1, 1, 1,3. We write the table first with undetermined entries: 


ag @ @ @® 


1 x xe 2 


x1 1 1 1 
X2 1 a bee 
x3 1 a bee 
xa | 3 * * * 


and we evaluate the form (10.4.4) on the orthogonal characters x; and x2. 
(10.4.13) (X1, X2) = yg (1+ 4a +4b + 3c) =0. 


Since x2 is a one-dimensional character, x2(z) = c is the trace of a 1X1 matrix. It is the 
unique entry in that matrix, and since z” = 1, its square is 1. So c is equal to 1 or-1. Similarly, 
since x° = 1, x2(x) = a will be a power of w = e2/3. So ais equal to 1, w, or w*. Moreover, 
b = a’. Looking at (10.4.13), one sees that a = 1 is impossible. The possible values are 
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a = @ or @’, and then c = 1. The same reasoning applies to the character x3. Since x2 
and x3 are distinct, and since we can interchange them, we may assume that a = w and 
a’ = a. It is natural to guess that the irreducible three-dimensional character x4 might be 
the character of the standard representation of T by rotations, and it is easy to verify this by 
computing that character and checking that (x, x) = 1. Since we know the other characters, 
X4 is also determined by the fact that the characters are orthonormal. The character table is 


ad) @ 4 © 


i Ae Re 2 
xi 1 yl, -* 1 1 
x2 1 w ow 1 
x3 1 w& w 1 
X4 3 0 O -1 
(10.4.14) - Character table of the tetrahedral group 


The columns in these tables are orthogonal. This is a general phenomenon, whose 
proof we leave as Exercise 4.6. 


10.5 ONE-DIMENSIONAL CHARACTERS 


A one-dimensional character is the character of a representation of G on a one-dimensional 
vector space. If o is a one-dimensional representation, then Og is represented by a 1X1 
matrix Rg, and x(g) is the unique entry in that matrix. Speaking loosely, 


(10.5.1) X(8) = Pg = Rg. 


A one-dimensional character x is a homomorphism from G to GL; = C%, because 


X(gh) = Pen = PgPhn = X(g) xh). 


If x is one-dimensional and if g is an element of G of order k, then x(g) is a power of the 
primitive root of unity ¢ = e?7/k, And since C* is abelian, any commutator is in 
the kernel of such a character. 

Normal subgroups are among the many things that can be determined by looking at a 
character table. The kernel of a one-dimensional character x is the union of the conjugacy 
classes C(g) such that x(g) = 1. For instance, the kernel of the character x2 in the character 
table of the tetrahedral group T is the union of the two conjugacy classes C(1) U C()). It is 
a normal subgroup of order four that we have seen before. 


Warning: A character of dimension greater than 1 is not a homomorphism. The values taken 
on by such a character are sums of roots of unity. 


Theorem 10.5.2 Let G be a finite abelian group. 
(a) Every irreducible character of G is one-dimensional. The number of irreducible charac- 
ters is equal to the order of the group. 


(b) Every matrix representation R of G is diagonalizable: There is an invertible matrix P 
such that P-!RgP is diagonal for all g. 
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Proof. In an abelian group of order N, there will be N conjugacy classes, each contain- 
ing a single element. Then according to the main theorem, the number of irreducible 
representations is also equal N. The formula N = de +--+» +d% shows that dj = 1 
for all f. O 


A simple example: The cyclic group C3 = {1, x, x*} of order 3 has three irreducible 
characters of dimension 1. If x is a one of them, then x(x) will be a power of w = e2”!/3, and 
x(x?) = x(x)*. Since there are three distinct powers of w and three irreducible characters, 
xi(x) must take on all three values. The character table of C3 is therefore 

gd @ dd) 


1 x x 


X1 1 1 1 
x2 1 wo @& 
x3 1 w ow 
(10.5.3) Character table of the cyclic group C3 


10.6 THE REGULAR REPRESENTATION 


Let S = (5), ..., Sn) bea finite ordered set on which a group G operates, and let Rp denote 
the permutation matrix that describes the operation of a group element g on S. If g operates 
on S as the permutation p, i.e., if gs; = sp;, that matrix is (see (1.5.7)) 


(10.6.1) Rg= Ss e pi,is 
i 


and Rge; = ep. The map g~ Rg defines a matrix representation R of G that we call a 
permutation representation, though that phrase had a different meaning in Section 6.11. The 
representation (10.2.11) of S3 is an example of a permutation representation. 

The ordering of S is used only so that we can assemble Ry into a matrix. It is nicer 
to describe a permutation representation without reference to an ordering. To do this we 
introduce a vector space Vs that has the unordered basis {e,} indexed by elements of S. 
Elements of Vs are linear combinations }°, cgég, with complex coefficients cg. If we are 
given an operation of G on the set S, the associated permutation representation p of G on 
Vs is defined by 


(10.6.2) Pgl€s) = Egs. 


When we choose an ordering of S, the basis {es} becomes an ordered basis, and the matrix 
of g has the form described above. 
The character of a permutation representation is especially easy to compute: 


Lemma 10.6.3 Let o be the permutation representation associated to an operation of a 
group G on a nonempty finite set S. For all gin G, x(g) is equal to the number of elements 
of S that are fixed by g. 


Proof. We order the set S arbitrarily. Then every element s that is fixed by g, there is a1 on 
the diagonal of the matrix Ry (10.6.1), and for every element that is not fixed, there is a 0.0 
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When we decompose a set on which G operates into orbits, we will obtain a decom- 
position of the permutation representation p or R as a direct sum. This is easy to see. But 
there is an important new feature: The fact that linear combinations are available allows us 
to decompose the representation further. Even when the operation of G on S is transitive, 
p will not be irreducible unless S is a set of one element. 


Lemma 10.6.4 Let R be the permutation representation associated to an operation of G on 
a finite nonempty ordered set S. When its character x is written as an integer combination 
of the irreducible characters, the trivial character x; appears. 


Proof. The vector ae €g of Vs, which corresponds to (1, 1,..., 1)' in C”, is fixed by every 
permutation of S, so it spans a G-invariant subspace of dimension 1 on which the group 
operates trivially. O 


Example 10.6.5 Let G be the tetrahedral group 7, and let S be the set (v1...., v4) of vertices 
of the tetrahedron. The operation of G on S defines a four-dimensional representation of 
G. Let x denote the rotation by 277 /3 about a face and z the rotation by 7 about an edge, as 
before (see 7.10.8). Then x acts as the 3-cycle (234) and z acts as (13)(24). The associated 
permutation representation is 


1 00 0 001 0 
000 1 000 1 
(10.6.6) Bee 4 QegilP> SE= (a 04080 
001 0 010 0 
Its character is 
1 x x2 z 
(10.6.7) er 7s es 


The character table (10.4.14) shows that x’°’? = x, + x4. By the way, another way to 
determine the character x4 in the character table is to check that (x, x¥¢"') = 2. Then 
xvert is a sum of two irreducible characters. Lemma 10.6.4 shows that one of them is the 
trivial character x;. So x"! — x, is an irreducible character. It must be x4. O 


e The regular representation p'®8 of a group G is the representation associated to the 
operation of G on itself by left multiplication. It is a representation on the vector space Vg 
that has a basis {é 9} indexed by elements of G. If h is an element of G, then 


(10.6.8) Pe (en) = gh. 


This operation of G on itself by left multiplication isn’t particularly interesting, but the 
associated permutation representation p’® is very interesting. Its character x’©8 is simple: 


(10.6.9) x81) =|1G|, and yx’ 8(g)=0, if gs. 


This is true because the dimension of x’ is the order of the group, and because multiplication 
by g doesn’t fix any element of G unless g = 1. 
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This simple formula makes it easy to compute (x’°8, x) for any character x: 
(10.6.10) (x78, x) = ey DL x8) X(8) = EM XCD = x(1) = dim x. 
8 


Corollary 10.6.11 Let x;,..., Xr be the irreducible characters of a finite group G, let p; be 
a representation with character x;, and let d; = dim x;. Then x’ = dj x,4+---+d,x,, 
and p’°S is isomorphic to d\p, ® --- ® d- pr. 


This follows from (10.6.10) and the projection formula. Isn’t it nice? Counting 
dimensions, 


r r 
(10.6.12) |G| = dim x’ = ) “dj dim xj = )- d?. 
i=) i=) 
This is the formula in (c) of the Main Theorem. So that formula follows from the orthogonality 
relations (10.4.6)(a). 
For instance, the character of the regular representation of the symmetric group $3 is 


1 x y 
xrTes 6 0 0) 


Looking at the character table (10.4.12) for S3, one sees that x"°8 = x1 + x2 + 2x3, as 
expected. 

Still one more way to determine the last character x4 of the tetrahedral group (see 
(10.4.14) is to use the relation x”°* = x1 + x2 +. x3 +3x4. 

We determine the character table of the icosahedral group J next. As we know, / is 


isomorphic to the alternating group As (7.4.4). The conjugacy classes have been determined 
before (7.4.1). They are listed below, with representative elements taken from As: 


class representative 
C; = {1) (1) 
C2 = 15 edge rotations, angle (12)(34) 
os) C3 = 20 vertex rotations, angles + 27/3 (123) 
C4 = 12 face rotations, angles + 2277/5 (12345) 
Cs = 12 face rotations, angles + 47/5 (13524) 


Since there are five conjugacy classes, there are five irreducible characters. The 
character table is 


(1) (15) (20) (12) (12) 
| 0 mw 2/3 2/5 4n/5 angle 
xi 1 1 1 1 1 
x2 3. -l 0 a B 
x3 3. -l 0 B a 
Xa 4 0 if -1 -1 
x5 5 1 -1 0 0 


(10.6.14) Character table of the icosahedral group J 
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The entries w and Bare explained below. One way to find the irreducible characters is 
to decompose some permutation representations. The alternating group As operates on the 
set of five indices. This gives us a five-dimensional permutation representation; we’ll call it 
p’. Its character x’ is 


0 mw 2/3 2n/5 42/5 angle 
x 5 1 2 0 0 


Then (x’, x’) = A (1 -52415-12420- 2?) = 2. Therefore x’ is the sum of two distinct 
irreducible characters. Since the trivial representation is a summand, x’ — x, is an irreducible 
character, the one labeled yz, in the table. 

Next, the icosahedral group J operates on the set of six pairs of opposite faces of the 
dodecahedron; let the corresponding six-dimensional character be x”. A similar computation 
shows that x” — xj is the irreducible character xs. 

We also have the representation of dimension 3 of J as a rotation group. Its character 
is ¥2. To compute that character, we remember that the trace of a rotation of R? with angle 
is 1+2.cos 6, which is also equal to 1 + e!? + e~"? (5.1.28). The second and third entries for 
X2 are 1+ 2cosa = -1 and 1 +2cos2z7/3 = 0. The last two entries are labeled 


a =142cos(27/5)=14+0404 and B=142cos(4r/5)=14+07 +2, 


where ¢ = e?”'/5. The remaining character x3 can be determined by orthogonality, or by 
using the relation 
x8 = x1 + 3x2 + 3x3 + 4x4 + 5X5. 


10.7 SCHUR’S LEMMA 


Let p and p’ be representations of a group G on vector spaces V and V’. A linear 
transformation T: V’ — V is called G-invariant if it is compatible with the operanon of G, 
meaning that for all g in G, 


(10.7.1) T(gv') = gTv'), or Top, = pgeT, 


as indicated by the diagram 


(10.7.2) Vey 
“| |e 
vv 


A bijective G-invariant linear transformation is an isomorphism of representations (10.1.16). 
It is useful to rewrite the condition for G-invariance in the form 


T(v') =g'T(gv'), or py! Tp, = T. 
This definition of a G-invariant linear transformation 7 makes sense only when the 


representations p and ,’ are given. It is important to keep this in mind when the ambiguous 
group operation notation 7(gv’) = g7(v’) is used. 
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If bases B and B’ for V and V’ are given, and if Rg, Ro, and M denote the matrices of 
Pg; Pg, and T with respect to these bases, the condition (10.7.1) becomes 


-1 
(10.7.3) MR, =RgM or R, MR, =M 
for all gin G. A matrix M is called G-invariant if it satisfies this condition. 


Lemma 10.7.4 The kernel and the image of a G-invariant linear transformation T: V’ > V 
are G-invariant subspaces of V’ and V, respectively. 


Proof. The kernel and image of any linear transformation are subspaces. To show that the 
kernel is G-invariant, we must show that if x is in ker T, then gx is in ker T, i.e., that if 
T(x) = 0, then 7(gx) = 0. This is true: T(gx) = g7(x) = g0 = 0. If yis in the image of T, 
ie., y = T(x) forsome x in V’, then gy = g7(x) = T(gx), so gy is in the image too. O 


Similarly, if p is a representation of G on V, a linear operator on V is G-invariant if 
(10.7.5) T(gv) =gT(v), or PgoT = Tops, forallginG, 


which means that T commutes with each of the operators pg. The matrix form of this 
condition is 
RgM = MRg or M=R;'MRg, forallginG. 


Because a G-invariant linear operator T must commute with all of the operators pg, 
invariance is a strong condition. Schur’s Lemma shows this. 


Theorem 10.7.6 Schur’s Lemma. 


(a) Let pand p’ beirreducible representations of G on vector spaces V and V’, respectively, 
and let 7: V’ > V bea G-invariant transformation. Either T is an isomorphism, or else 
T =0. 

(b) Let p be an irreducible representation of G on a vector space V, and let 7: V > V be 
a G-invariant linear operator. Then T is multiplication by a scalar: T = cl. 


Proof. (a) Suppose that T is not the zero map. Since p’ is irreducible and since ker T is 
a G-invariant subspace, kerT is either V’ or {0}. It is not V’ because 7 #0. Therefore 
ker T = {0}, and T is injective. Since ¢ is irreducible and im T is G-invariant, im T is either 
{0} or V. It is not {0} because T# 0. Therefore im J = V and T is surjective. 


(b) Suppose that 7 is a G-invariant linear operator on V. We choose an eigenvalue A 
of 7. The linear operator S = T — AJ is also G-invariant. The kernel of S isn’t zero 
because it contains an eigenvector of 7. Therefore S is not an isomorphism. By (a), 
S=Oand7T=Al. 


Suppose that we are given representations o and p’ on spaces V and V’. Though 
G-invariant linear tranformations are rare, the averaging process can be used to create a 
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G-invariant transformation from any linear transformation T: V’ > V. The average is the 
linear transformation T defined by 


(10.7.7) Tv’) = gj De (T(eu')), or T= 7g, D> 0;'Tp4. 
geG geG 


Similarly, if we are given matrix representations R and R’, of G of dimensions n and m, and 
if M is any m Xn matrix, then the averaged matrix is 


y— 1 -1 ! 
(10.7.8) M= 2% >. R;,'MR,. 
geG 


Lemma 10.7.9 With the above notation, T is a G-invariant linear transformation, and M is a 
G-invariant matrix. If T is G-invariant, then T = T, and if M is G-invariant, then M = M. 


Proof. Since compositions and sums of linear transformations are linear, iF is a linear 
transformation, and it is easy to see that T = T if T is invariant. To show that T is invariant, 
we let h be an element of G and we show that TJ = h™'Th. We make the substitution 


= gh. Reindexing as in (10.2.3), 


h" Th =h! (ei Le 17g) = par (eh) 'T(gh) 


The proof that M is invariant is analogous. O 


The averaging process may yield T = 0, the trivial transformation, though T was 
not zero. Schur’s Lemma tells us that this must happen if p and ¢’ are irreducible and not 
isomorphic. This fact is the basis of the proof given in the next section that distinct irreducible 
characters are orthogonal. For linear operators, the average is often not zero, because trace 
is preserved by the averaging process. 


Proposition 10.7.10 Let o be an irreducible representation of G on a vector space V. 
Let T: V > V be a linear operator, and let T be as in (10.7.7), with p’ = p. Then 
traceT = traceT. If trace 740, then 740. 0D 


10.8 PROOF OF THE ORTHOGONALITY RELATIONS 
We will now prove (a) of the Main Theorem. We use matrix notation. Let M denote the 
space C”™” of m Xn matrices. 


Lemma 10.8.1 Let A and B be m Xm and n Xn matrices respectively, and let F be the linear 
operator on M defined by F(M) = AMB. The trace of F is the product (trace A) (trace B). 
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Proof. The trace of an operator is the sum of its eigenvalues. Let a@1,...@m and f1,..., Bn 
be the eigenvalues of A and B’ respectively. If X; is an eigenvector of A with eigenvalue a, 
and Y ; isan eigenvector of B’ with eigenvalue A ;, the mxn matrix M = X iY is an eigenvector 
for the operator F, with eigenvalue a;f ;. Since the dimension of M is mn, the mn complex 
numbers a; fj; are all of the eigenvalues, provided that they are distinct. If so, then 


trace F = > aif; = (>> ai)( ra) = (trace A) (trace B). 
i,j i J 


In general, there will be matrices A’ and B’ arbitrarily close to A and B such that the products 
of their eigenvalues are distinct, and the lemma follows by continuity (see Section 5.2). O 


Let p’ and ¢ be representations of dimensions m and n, with characters x’ and x 
respectively, and let R’ and R be the matrix representations obtained from p’ and p using 
some arbitrary bases. We define a linear operator ® on the space M by 


(10.8.2) (M) = 7 DR, MR, =M. 
8 
In the last section, we saw that M is a G-invariant matrix, and that M = M if M is invariant. 
Therefore the image of ® is the space of G-invariant matrices. We denote that space by M. 
Parts (a) and (b) of the next lemma compute the trace of the operator ® in two ways. 
The orthogonality relations are part (ce). 


Lemma 10.8.3 With the above notation, 
(a) trace P = (x, x’). 


(b) trace b = dim M. 


(c) If. o is an irreducible representation, (x, x) = 1, and if o and p’ are non-isomorphic 
irreducible representations, (x, x’) = 0. 


Proof. (a) We recall that x(g"!) = x(g) (10.4.2)(d). Let F’, denote the linear operator on 
M defined by F,(M) = Ry MR, 
Since trace is linear, Lemma 10.8.1 shows that 


poe sed - ~1 , 
trace = Dig trace Fy = zz Dig (trace Ry) (trace Ri) 


(10.8.4) : a eee 
= ey Lg X08) H'(8) = ey Leg X(B)X'(8) = (x, X’)- 


(b) Let “VV be the kernel of ®. If M is in the intersection MON , then P(M) = M and also 
®(M) = 0, so M = 0. The intersection is the zero space. Therefore M is the direct sum 
M®WN (4.3.1)(b). We choose a basis for M by appending bases of M and NV. Since M=M 
if M is invariant, ® is the identity on M. So the matrix of ® will have the block form 


9} 


where J is the identity matrix of size dim M. Its trace is equal to the dimension of M. 
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(c) We apply (a) and (b): (x, x’) = dim M. IE p' and p are irreducible and not isomorphic, 
Schur’s Lemma tells us that the only G-invariant operator is zero, and so the only G- 
invariant matrix is the zero matrix. Therefore M = {0} and (x, x’) = 0. If p’ = p, Schur’s 
Lemma says that the G-invariant matrices have the form cf. Then M has dimension 1, 
and (x, x’) =1. | 


We go over to operator notation for the proof of Theorem 10.4.6(b), that the number 
of irreducible characters is equal to the number of conjugacy classes in the group. As before, 
H denotes the space of class functions. Its dimension is equal to the number of conjugacy 
classes (see (10.4.11)). Let C denote the subspace of # spanned by the characters. We 
show that C = H by showing that the orthogonal space to C in H is zero. The next lemma 
does this. 


Lemma 10.8.5 


(a) Let gy be a class function on G that is orthogonal to every character. For any represen- 
tation p of G, a > 2 (2) Pg is the zero operator. 


. reg 
(b) Let p’°8 be the regular representation of G. The operators pg 


independent. 
(c) The only class function ¢ that is orthogonal to every character is the zero function. 


with g in G are linearly 


Proof. (a) Since any representation is a direct sum of irreducible representations, we may 
assume that ¢ is irreducible. Let T = ai Die (8) Pg. We first show that T is a G-invariant 


operator, i.e., that T = 7,1 TPh for every h in G. Let g” = h7|gh. Then as g runs over the 


group G, so does g”. Since p is a homomorphism, Py PegPh = Pg, and because ¢ is a class 
function, g(g) = y(g”). Therefore 


Pn TPh = Te 2 0()eg = = Gi 2908") P9 = Tl 2, 9(8)P— = = 


Let x be the character of p. The trace of T is aI Dig 9(8)X(8) = (Y, xX). The trace is 
zero because ¢ is orthogonal to x. Since p is irreducible, Schur’s lemma tells us that 7 is 
multiplication by a scalar, and since its trace is zero, T = 0. 


(b) We apply Formula 10.6.8 to the basis element e; of Vg: pee (€1) = @g. Then since the 
vectors €g are independent elements of Vg, the operators ps” are independent too. 


(c) Let y be a class function orthogonal to every character. (a) tells us that )>, 9(g) (g)P2* =0 


is a linear relation among the operators Pars which are independent by (b). Therefore all of 


the coefficients g(g) are zero, and gis the zero function. O 


10.9 REPRESENTATIONS OF SU2 


Remarkably, the orthogonality relations carry over to compact groups, matrix groups that 
are compact subsets of spaces of matrices, when summation over the group is replaced by 
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an integral. In this section, we verify this for some representations of the special unitary 
group SU. 


We begin by defining the representations that we will analyze. Let H, denote the 
complex vector space of homogeneous polynomials of degree n in the variables u, v, of 
the form 


(10.9.1) f(u,v) = cou" + cn_yu™ lv t+ +e, uu +c, v". 
We define a representation 
(10.9.2) Pn: SU2 > GL(An) 


as follows: The result of operating by an element P of SU2 ona polynomial f in Hp will be 
another polynomial that we denote by [Pf]. The definition is 


(10.9.3) [Pfl(u, v) = fluat+vb,-ub+va), where P= E =| : 


In words, P operates by substituting (u, v)P for the variables (u, v). Thus 
[Pu'v!] = (ua + vb)i(-ub + va). 


It is easy to compute the matrix of this operator when P is diagonal. Let a = e”, and let 


(10.9.4) Ag = & | = ki ‘| = a al 


Then [Agu'v] = (uc)! (va)! = u! v/a!4, So Ag acts on the basis (u”, u”—!v, ..., wv}, v) 
of the space Hy, as the diagonal matrix 


a’ 


The character x, of the representation , is defined as before: x,(g) = trace On, g. It 
is constant on the conjugacy classes, which are the latitudes on the sphere SU2. Because of 
this, it is enough to compute the characters x, on one matrix in each latitude, and we use 
Ag. To simplify notation, we write xn (8) for xn (Ag). The character is 


x0(9) = 1 

x10 =a+ a! 

x2(0) =a? +1407? 

“a ant _ ag Mtl) 


29. Se 
(10.9.5) Xn) =a" +a 4+-.-4+07%H= ars 
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The Hermitian product that replaces (10.4.3) is 


(10.9.6) (Xm, Xn) = rel i Xm(2)Xn(g) dV. 


In this formula G stands for the group SU, the unit 3-sphere, |G| is the three-dimensional 
volume of the unit sphere, and dV stands for the integral with respect to three-dimensional 
volume. The characters happen to be real-valued functions, so the complex conjugation that 
appears in the formula is irrelevant. 


Theorem 10.9.7 Thecharacters of SU2 that are defined above are orthonormal: (xm, Xn) = 0 
if m#n, and (Xn, Xn) = 1. 


Proof. Since the characters are constant on the latitudes, we can evaluate the integral (10.9.6) 
by slicing, as we learn to do in calculus. We use the unit circle x9 = cos6, x1 = sin@, and 
X2 = +++ =X, = Oto parametrize the slices of the unit n-sphere S” : (xp + x2 te: +x2 = 1}. 
So 6 = Ois the north pole, and 9 = ris the south pole (see Section 9.2). For 0 < 6 < 7, the 
slice of the unit m-sphere is an (7 —1)-sphere of radius sin 6. 


To compute an integral by slicing, we integrate with respect to arc length on the unit 
circle. Let vol, (r) denote the n-dimensional volume of the n-sphere of radius r. So vol; (7) 
is the arc length of the circle of radius r, and volz(r) is the surface area of the 2-sphere of 
radius r. If f is a function on the unit m-sphere S” that is constant on the slices 9 = c, its 
integral will be 


(10.9.8) [ SdVn = i. (0) voln_1 (sin 8) d8, 


where dV, denotes integration with respect to n-dimensional volume, and f(@) denotes the 
value of f on the slice. 


Integration by slicing provides a recursive formula for the volumes of the spheres: 
4 
(10.9.9) vol, (1) =| lav, =[ vol,,—; (sin 9) d6, 
sr 0 


and vol,(r) = r"voln(1). The zero-sphere x = r* consists of two points. Its zero- 
dimensional volume is 2. So 


a 1 
vol,(r) =r i volo(sin@)d8 =r i 2d0 =2nr, 
0 0 
ue n 
(10.9.10) vol2(r) = aah vol; (sin @)d@ = Pf 2msinOdé = 4nr’, 
9 0 
ve: 7 
vol3(r) =r i volo(sin @)d6 = °° [ Ar sin? 0d@ = 272. 
0 0 


To evaluate the last integral, it is convenient to use the formula sin@ = ~i(a@ — a~')/2. 


(10.9.11) vol (sin 6) = 47 sin? @ = -n(a — a !)?. 
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Expanding, vol2(sin@) = 2(2 —(@ + a~')). The integral of a + a? is zero: 


od 20 j 
Peirce bs: &. kip }9 ifk>0 
(10.9.12) i (aX + a ")do = if ota |, ifk =0. 


We now compute the integral (10.9.6). The volume of the group SU} is 
(10.9.13) vol3(1) = 277. 
The latitude sphere that contains Ag has radius sin 9. Since the characters are real, integration 


by slicing gives 


(10.9.14) 
1 ria 
(xm kn) = 525 | xm (0%n(6) voln(sin 8) db 
1 Jo 


1 w qmt = a7 (mt) qt _ a7 (+1) ae 
=saf (— ee (-7(a — a *)*) dO 


= i am mt+n+2 —(m+n+2) 1 ‘ m—n n—m 
= a (wm *n+2 + oy )a9+ 5 [ (a + a") ag 


This evaluates to 1 if m = n and to zero otherwise (see (10.9.12)). The characters x, are 
orthonormal. Oo 


We won’t prove the next theorem, though the proof follows the case of finite groups 
fairly closely. If you are interested, see [Sepanski]. 


Theorem 10.9.15 Every continuous representation of SU2 is isomorphic to a direct sum of 
the representations p, (10.9.2). 


We leave the obvious generalizations to the reader. 


—lsrael Herstein 


EXERCISES 


Section 1 Definitions 
1.1. Show that the image of a representation of dimension 1 of a finite group is a cyclic group. 
1.2. (a) Choose a suitable basis for R°? and write the standard representation of the octahedral 


group O explicitly. (b) Do the same for the dihedral group Dn. 
Section 2 Irreducible Representations 


2.1. Prove that the standard three-dimensional representation of the tetrahedral group T is 
irreducible as a complex representation. 
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2.2. Consider the standard two-dimensional representation of the dihedral group D,,. For 
which vn is this an irreducible complex representation? 


2.3. Suppose given a representation of the symmetric group $3 on a vector space V. Let x 
and y denote the usual generators for $3. 


(a) Let u be a nonzero vector in V. Let v = u+xu+2x2u and w =u yu. By 
analyzing the G-orbits of v, w, show that V contains a nonzero invariant subspace 
of dimension at most 2. 

(b) Prove that all irreducible two-dimensional representations of G are isomorphic, and 
determine all irreducible representations of G. 


Section 3 Unitary Representations 


1 -1 


3.1. Let G be a cyclic group of order 3. The matrix A = E 0 has order 3, so it defines 


a matrix representation of G. Use the averaging process to produce a G-invariant form 
from the standard Hermitian product X*Y on C2. 

3.2. Let o:G > GL(V) be a representation ofa finite group on a real vector space V. Prove 
the following: 


(a) There exists a G-invariant, positive definite symmetricform (, ) on V. 
(b) pis a direct sum of irreducible representations. 
(c) Every finite subgroup of GL» (R) is conjugate to a subgroup of O,. 


3.3. (a) Let R:G > SL2(R) be a faithful representation of a finite group by real 2x2 
matrices with determinant 1. Use the results of Exercise 3.2 to prove that G is a 
cyclic group. 

(b) Determine the finite groups that have faithful real two-dimensional representations. 
(c) Determine the finite groups that have faithful real three-dimensional representations 


with determinant 1. 
3.4. Let (, ) be a nondegenerate skew-symmetric form on a vector space V, and let p be 


a representation of a finite group G on V. Prove that the averaging process (10.3.7) 
produces a G-invariant skew-symmetric form on V, and show by example that the form 
obtained in this way needn’t be nondegenerate. 


3.5. Let x be a generator of a cyclic group G of order p. Sending x ~~» i i defines a 


matrix representation G > GL2(F)). Prove that this representation is not the direct 
sum of irreducible representations. 


Section 4 Characters 


4.1. Find the dimensions of the irreducible representations of the octahedral group, the 
quaternion group, and the dihedral groups D4, Ds, and Dg. 


4.2. Anonabelian group G has order 55. Determine its class equation and the dimensions of 
its irreducible characters. 


4.3. Determine the character tables for 


(a) the Klein four group, 
(b) the quaternion group, 
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4.4. 


4.5. 
4.6. 


4.7. 


4.8. 


*4.9, 


4.10. 


(c) the dihedral group Dg, 
(d) the dihedral group Dg, 
(e) anonabelian group of order 21 (see Proposition 7.7.7). 


Let G be the dihedral group Ds, presented with generators x, y and relations x° = 1, 
y? = 1, yxy! = x7!, and let x be an arbitrary two-dimensional character of G. 


(a) What does the relation x> = 1 tell us about x(x)? 
(b) What does the fact that x and x7! are conjugate tell us about x(x)? 
(c) Determine the character table of G. 


(d) Decompose the restriction of each irreducible character of Ds into irreducible 
characters of Cs. 


Let G = (x, y|x°, y*, yxy"!x72). Determine the character table of G. 


Explain how to adjust the entries of a character table to produce a unitary matrix, and 
prove that the columns of a character table are orthogonal. 


Let 7:G — G’ =G/N be the canonical map from a finite group to a quotient group, 
and let p’ be an irreducible representation of G’. Prove that the representation p = p’o7 
of G is irreducible in two ways: directly, and using Theorem 10.4.6. 


Find the missing rows in the character table below: 


gd) @ © © @® 


Below is a partial character table. One conjugacy class is missing. 


gd @) 2 @ &) 


1 u v ow x 
x1 1 1 1 1 1 
x2 1 1 1 1 - 
x3 1 -1 1 -1 i 
X4 1 -l 1 -1 -i 
xs 2 2 -1 -1 0 


(a) Complete the table. 

(b) Determine the orders of representative elements in each conjugacy class. 
(c) Determine the normal subgroups. 

(d) Describe the group. 


(a) Find the missing rows in the character table below. 
(b) Determine the orders of the elements a, b, c, d. 


(c) Show that the group G with this character table has a subgroup H of order 10, and 
describe this subgroup as a union of conjugacy classes. 
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(d) Decide whether H is Cyg or Ds. 
(e) Determine all normal subgroups of G. 


@) @) G6) 6) 6) 


1 a ob eceoad 
Xt 1 1 1 1 1 
X2 1 1 -1 -1 1 
x3 1 1 -i i ~l 
XA 1 1 i -i -1 


*4,11. In the character table below, w = e274/3, 


OO aaa MM 
1a b c doef 
x1 1 j eee eee Coes Dees | 1 
x2 1 1 1 wo @ wo @ 
x3 1 1 1 @ 0 O88 w 
x4 1 1 -1 -w -® w @ 
Xs 1 1 -1 -®© -w © ow 
X6 1 1 -1 -1 -1 #1 =61 
x7 6 -1 0 0 0 0 90 


(a) Show that G has a normal subgroup N isomorphic to D7. 

(b) Decompose the restrictions of each character to N into irreducible N-characters. 
(c) Determine the numbers of Sylow p-subgroups, for p = 2, 3, and 7. 

(d) Determine the orders of the representative elements c, d, e, f. 

(e) Determine all normal subgroups of G. 


4.12, Let H be a subgroup of index 2 of a group G, and let 0: H + GL(V) be a represen- 
tation. Let a be an element of G not in H. Define a conjugate representation o’: H > 
G L(V) by the rule o’(h) = a(aha), Prove that 


(a) o’ is a representation of H. 

(b) Ifo is the restriction to H of a representation of G, then o’ is isomorphic to co. 

(c) If bis another element of G not in H, then the representation 0” (h) = a(b “hb) is 
isomorphic to 0’. 


Section 5 One-Dimensional Characters 


5.1. Decompose the standard two-dimensional representation of the cyclic group C, by 
rotations into irreducible (complex) representations. 


§.2. Prove that the sign representation p ~~ sign p and the trivial representation are the only 
one-dimensional representations of the symmetric group S,. 


§.3. Suppose that a group G has exactly two irreducible characters of dimension 1, and let x 
denote the nontrivial one-dimensional character. Prove that for all g in G, x(g) = +1. 
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5.4. 


5.5. 


5.6. 


5.7. 


Let x be the character of a representation p of dimension d. Prove that |x(g)| < d for 
all g in G, and that if |x(g)| = d, then p(g) = ¢/, for some root of unity ¢. Moreover, if 
x(g) = d, then fg is the identity operator. 


Prove that the one-dimensional characters of a group G forma group under multiplication 
of functions. This group is called the character group of G, and is often denoted by G. 
Prove that if G is abelian, then |G| = |G| and G~G. 


Let G bea cyclic group of order n, generated by an clement x, and let ¢ = e?7!/", 
(a) Prove that the irreducible representations are p,..., On—1, Where 0¢,:G ~~» C% is 
defined by pz(x) = 


(b) Identify the character group of G (see Exercise 5.5). 


(a) Let g:G > G' be a homomorphism of abelian groups. Define an induced homo- 
morphism @: G’ > G between their character groups (see Exercise 5.5). 


(b) Prove that if gis injective, then @is surjective, and conversely. 


Section6 The Regular Representation 


6.1. 
6.2. 


6.3. 


6.4. 


6.5. 


6.6. 


6.7. 


Let R’®S denote the regular matrix representation of a group G. Determine }* 2 Re” 

Let ¢ be the permutation representation associated to the operation of D3 on itself by 

conjugation. Decompose the character of ¢ into irreducible characters. 

Let x° denote the character of the representation of the tetrahedral group T on the six 

edges of the tetrahedron. Decompose this character into irreducible characters. 

(a) Identify the five conjugacy classes in the octahedral group O, and find the orders of 
its irreducible representations. 

(b) The group O operates on these sets: 

six faces of the cube, 


three pairs of opposite faces, 


eight vertices, 


four pairs of opposite vertices, 


six pairs of opposite edges, 


two inscribed tetrahedra. 


Decompose the corresponding characters into irreducible characters. 


(c) Compute the character table for O. 

The symmetric group S, operates on C” by permuting the coordinates. Decompose this 
representation explicitly into irreducible representations. 

Hint: I recommend against using the orthogonality relations. This problem is closely 
related to Exercise M.1 from Chapter 4. 

Decompose the characters of the representations of the icosahedral group on the sets of 
faces, edges, and vertices into irreducible characters. 


The group Ss operates by conjugation on its normal subgroup As. How does this action 
operate on the isomorphism classes of irreducible representations of As? 


6.8. 


6.9, 


6.10. 


6.11. 
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The stabilizer in the icosahedral group of one of the cubes inscribed in a dodecahedron 
is the tetrahedral group T. Decompose the restrictions to T of the irreducible characters 
of J. 


(a) Explain how one can prove that a group is simple by looking at its character 
table. 


(b) Use the character table of the icosahedral group to prove that it is a simple group. 


Determine the character tables for the nonabelian groups of order 12 
(see (7.8.1)). 


The character table for the group G = PSL 2(F7) is below, with y = y(-1 + /7i), 
y¥ =4-1-Vii). 


(1) (21) (24) (24) (42) (S56) 


1 a b c d e 
x1 ae | 1 1 1 1 
x2 3 #1 a 0 
x3 a2 1 yr eye A 0 
x4 6 2 -1 -1 OO O 
x | 7 -1 #O OO -1 1 
ae.l| Se Oy | i. ih! ~ Oe aa 


(a) Use itto give two proofs that this group is simple. 
(b) Identify, so far as possible, columns that corresponds to the conjugacy classes of the 


elements 
1 1 2 
1 3 4 % 


and find matrices that represent the remaining conjugacy classes. 


(c) G operates on the set of eight one-dimensional subspaces of F2. Decompose the 
associated character into irreducible characters. 


Section 7 Schur’s Lemma 


7.1. 


7.2. 


7.3. 


Prove a converse to Schur’s Lemma: If ¢ is a representation, and if the only G-invariant 
linear operators on V are multiplications by scalars, then ¢ is irreducible. 
Let A be the standard representation (10.1.3) of the symmetric group $3, and let 


B= ak . Use the averaging process to produce a G-invariant linear operator from 


left multiplication by B. 


11 -1 -1 -1 
The matrices Ry = 1],Ry=]-1 1 | define a representation R of the 
1 -1 -1 


group 53. Let gbe the linear transformation C! — C? whose matrix is (1, 0, 0)'. Use the 
averaging method to produce a G-invariant linear transformation from 9g, using the sign 
representation E of (10.1.4) on C! and the representation R on C3. 
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7.4. Let p be a representation of G and let C be a conjugacy class in G. Show that the linear 


operator T = }' pec Pg is G-invariant. 


7.5. Let o be arepresentation of a group G on V, and let x be acharacter of G, not necessarily 


the character of p. Prové that the linear operator T = >> 2 xX(g) Pg on V is G-invariant. 


7.6. Compute the matrix of the operator F of Lemma 10.8.1, and use the matrix to verify the 


formula for its trace. 


Section 8 Representations of SU2 


8.1. Calculate the four-dimensional volume of the 4-ball B* of radius r in R4, the locus 


8.2. 
8.3. 
8.4. 


8.5. 


8.6 


8.7. 


Xe tee x3 < r°, by slicing with three-dimensional slices. Check your answer by 
differentiating. 


Verify the associative law [Q[P f]] = [(QP) f] for the operation (10.9.3). 
Prove that the orthogonal representation (9.4.1) SUz + SO3 is irreducible. 


Left multiplication defines a representation of SU2 on the space R* with coordinates 
Xg,..., 3, as in Section 9.3. Decompose the associated complex representation into 
irreducible representations. 


Use Theorem 10.9.14 to determine the irreducible representations of the rotation group 
SO3. 

(representations of the circle group) All representations here are assumed tobe differen- 
tiable functions of 0. Let G be the circle group {e!9}. 


(a) Let p be a representation of G ona vector space V. Show that there exists a positive 
definite G-invariant Hermitian form on V. 

(b) Prove Maschke’s Theorem for G. 

(c) Describe the representations of G in terms of one-parameter groups, and use that 
description to prove that the irreducible representations are one-dimensional. 


(d) Verify the orthogonality relations, using an analogue of the Hermitian product 
(10.9.6). 


Using the results of Exercise 8.6, determine the irreducible representations of the 
orthogonal group O32. 


Miscellaneous Problems 


M.1. The representations in this problem are real. A molecule M in ‘Flatland’ (a two- 


dimensional world) consists of three like atoms a1, a2, a3 forming a triangle. The triangle 
is equilateral at time fp, its center is at the origin, and a, is on the positive x-axis. The group 
G of symmetries of M at time fo is the dihedral group D3. We list the velocities of the 
individual atoms at tg and call the resulting six-dimensional vector v = (v1, v2, v3)’ the 
state of M. The operation of G on the space V of state vectors defines a six-dimensional 
matrix representation S. For example, the rotation p by 27/3 about the origin permutes 
the atoms cyclically, and at the same time it rotates them. 


(a) Let 7 be the reflection about the x-axis. Determine the matrices Sp and S;,. 

(b) Determine the space W of vectors fixed by Sp, and show that W is G-invariant. 

(c) Decompose W and V explicitly into direct sums of irreducible G-invariant subspaces. 
(d) Explain the subspaces found in (c) in terms of motions and vibrations of the molecule. 


M.2. 


M.3. 


M.4. 


M.7. 
M.8. 


*M.9, 


*M.10. 
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Whatcan besaid about a group that hasexactly three irreducible characters, of dimensions 
1, 2, and 3, respectively? 


Let p be a representation of a group G. In each of the following cases, decide whether or 
not p’ is a representation, and whether or not it is necessarily isomorphic to p. 

(a) x 1s a fixed element of G, and p, = pyg,-1 

(b) gis an automorphism of G, and Py = Pog): 

(c) ois a one-dimensional representation of G, and p, = og pg. 


Prove that an element z of a group G is in the center of G if and only if for all irreducible 
representations p, o(z) is multiplication by a scalar. 


. Let A, B be commuting matrices such that some positive power of each matrix is the 


identity. Prove that there is an invertible matrix P such that PAP! and PBP™! are both 
diagonal. 


. Let p be an irreducible representation of a finite group G. How unique is the positive 


definite G-invariant Hermitian form? 
Describe the commutator subgroup of a group G in terms of the character table. 


Prove that a finite simple group that is not of prime order has no nontrivial representation 
of dimension 2. 


Let H be a subgroup of index 2 of a finite group G. Let a be an element of G that is not 
in H,so that H and @H are the two cosets of H. 


(a) Given a matrix representation S$: H — GL, of the subgroup H, the induced 
representation ind S: G — GL, of the group G is defined by 


: _ |S, 90 ; _|0 Sea 
(ind S);, = E ae (ind S)g = Vs 0 


for hin H and ginaH. Prove that ind S is a representation of G, and describe its 
character. 

Note: The element a~'ha will be in H, but because a is not in H, it needn’t be a 
conjugate of f in H. 

(b) If R:G > GL, isa matrix representation of G, we may restrict it to H. We denote 
the restriction by resR: H + GLy. Prove that res(ind S) ~ S ® S’, where S’ is the 
conjugate representation defined by S;, = Sg-tpnq- 

(c) Prove Frobenius reciprocity: (Xinds, XR) = (XS; Xres R): 

(d) Let S beanirreducible representation of H . Use Frobenius reciprocity to prove that if 
S not isomorphic to the conjugate representation S’, then the induced representation 
ind S is irreducible, and on the other hand, if S and S’ are isomorphic, then ind S isa 
sum of two non-isomorphic representations of G. 


Let H bea subgroup of index 2 ofa group G, and let R be a matrix representation of G. 
Let R’ denote the representation defined by R, = Rgifg ¢ H, and R, = -Rg otherwise. 


(a) Show that R’ is isomorphic to R if and only if the character of R is identically zero on 
the coset gH not equal to H. 
(b) Use Frobenius reciprocity (Exercise M.9) to show that ind(res R) ~ R ® R’. 
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(c) Suppose that R is irreducible. Show that if R is not isomorphic to R’, then res R is 
irreducible, and if these two representations are isomorphic, then res R is a sum of 
two irreducible representations of H. 


*M.11. Derive the character table of S,, using induced representations from A,, when 
(a)n=3, (b)n=4, (C)n=S. 
*M.12. Derive the character table of the dihedral group D,, using induced representations 
from Cy. 


M.13. Let G be a finite subgroup of G Lp (C). Prove that if }”, trace g = 0, then }°, g = 0. 


M.14, Let p: G + GL(V) be a two-dimensional representation of a finite group G, and 
assume that 1 is an eigenvalue of 0, for every g in G. Prove that o is a sum of two 
one-dimensional representations. 

M.15. Let 9: G > GL, (C) be an irreducible representation of a finite group G. Given a 


representation 0:GL, — GL(V) of GL,, we can consider the composition 0 o 0 as a 
representation of G. 


(a) Determine the character of the representation obtained in this way when o is left 
multiplication of GL, on the space V of n Xn matrices. Decompose a o p into 
irreducible representations in this case. 


(b) Determine the character of o o p when a is the operation of conjugation on C”*", 


CHAPTER 11 


Rings 


Bitte vergiB alles, was Du auf der Schule gelernt hast; 
denn Du hast es nicht gelernt. 


—Edmund Landau 


11.1 DEFINITION OF A RING 


Rings are algebraic structures closed under addition, subtraction, and multiplication, but not 
under division. The integers form our basic model for this concept. 

Before going to the definition of a ring, we look at a few examples, subrings of the 
complex numbers. A subring of C is a subset which is closed under addition, subtraction and 
multiplication, and which contains 1. 


« The Gauss integers , the complex numbers of the form a + bi, where a and b are integers, 
form a subring of C that we denote by Z[i]: 


(11.1.1) Z[i] = {a+bi| a, be Z). 


Its elements are the points of a square lattice in the complex plane. 


We can form a subring Z[a] analogous to the ring of Gauss integers, starting with any 
complex number a: the subring generated by a. This is the smallest subring of C that contains 
a, and it can be described in a general way. If a ring contains a, then it contains all positive 
powers of a because it is closed under multiplication. It also contains sums and differences 
of such powers, and it contains 1. Therefore it contains every complex number # that can 
be expressed as an integer combination of powers of a, or, saying this another way, can be 
obtained by evaluating a polynomial with integer coefficients at a: 


(11.1.2) B=ana"+---+aja+do, where a; arein Z. 


On the other hand, the set of all such numbers is closed under the operations +, —, and x, 
and it contains 1. So it is the subring generated by a. 

In most cases, Z[qa] will not be represented as a lattice in the complex plane. For 
example, the ring z[ 5] consists of the rational numbers that can be expressed as a polynomial 
in 5 with integer coefficients. These rational numbers can be described simply as those whose 
denominators are powers of 2. They form a dense subset of the real line. 
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e« A complex number a@ is algebraic if it is a root of a (nonzero) polynomial with integer 
coefficients — that is, if some expression of the form (11.1.2) evaluates to zero. If there is no 
polynomial with integer coefficients having @ as a root, @ is transcendental. The numbers e 
and zr are transcendental, though it isn’t very easy to prove this. 


When q@ is transcendental, two distinct polynomial expressions (11.1.2) represent distinct 
complex numbers. Then the elements of the ring Z[a] correspond bijectively to polynomials 
p(X) with integer coefficients, by the rule p(x) ~~ p(q@). When a is algebraic there will be 
many polynomial expressions that represent the same complex number. Some examples of 
algebraic numbers are: i+ 3, 1/7, 7+ J2, and V3 + J-5. 


The definition of a ring is similar to that of field (3.2.2). The only difference is that 
multiplicative inverses aren’t required: 


Definition 11.1.3 (4, —,X, 1) Arving Ris aset withtwolaws of composition + and x, called 

addition and multiplication, that satisfy these axioms: 

(a) With the law of composition +, R is an abelian group that we denote by R‘*; its identity 
is denoted by 0. 

(b) Multiplication is commutative and associative, and has an identity denoted by 1. 

(c) distributive law: For alla,b,andcin R, (a+ b)c =ac + be. 


A subring of a ring is a subset that is closed under the operations of addition, subtraction, 
and multiplication and that contains the element 1. 


Note: There is a related concept, of a noncommutative ring — a structure that satisfies all 
axioms of (11.1.3) except the commutative law for multiplication. The set of all realn Xn 
matrices is one example. Since we won’t be studying noncommutative rings, we use the word 
“ring” to mean ‘‘commutative ring.” 0 


Aside from subrings of C, the most important rings are polynomial rings. A polynomial 
in x with coefficients in a ring R is an expression of the form 


(11.1.4) Anx” +---+ajyx+apo, 
with a; in R. The set of these polynomials forms a ring that we discuss in the next section. 


Another example: The set R of continuous real-valued functions. of a real variable x 
forms a ring, with addition and multiplication of functions: [f + g](x) = f(x) + g(x) and 
[fgl(x) = fix)g). 

There is a ring that contains just one element, 0; it is called the zero ring. In the 
definition of a field (3.2.2), the set F* obtained by deleting 0 is a group that contains the 
multiplicative identity 1. So 1 is not equal to 0 in a field. The relation 1 = 0 hasn’t been ruled 
out in a ring, but it occurs only once: 


Proposition 11.1.5 A ring R in which the elements 1 and 0 are equal is the zero ring. 
Proof. We first note that 02 = 0 for every element a of a ring R. The proof is the same as 


for vector spaces: 0 = 0a — 0a = (0 — 0)a = Oa. Assume that 1 = 0 in R, and let a be any 
element. Then a = la = 0a = 0. The only element of R is 0. Oo 
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Though elements of a ring aren’t required to have multiplicative inverses, a particular 
element may have an inverse, and the inverse is unique if it exists. 


¢« Aunit of aring is an element that has a multiplicative inverse. 


The units in the ring of integers are 1 and -1, and the units in the ring of Gauss integers 
are +1 and +i. The units in the ring R[x] of real polynomials are the nonzero constant 
polynomials. Fields are rings in which 04 1 and in which every nonzero element is a unit. 

The identity element 1 of a ring is always a unit, and any reference to “the” unit 
element in R refers to the identity element. The ambiguous term “unit” is poorly chosen, 
but it is too late to change it. 


11.2 POLYNOMIAL RINGS 


¢ A polynomial with coefficients in a ring Ris a (finite) linear combination of powers of the 
variable: 


(11.2.1) FY) SO Hag pO a a ao; 


where the coefficients a; are elements of R.Such an expression is sometimes called a formal 
polynomial, to distinguish it from a polynomial function. Every formal polynomial with real 
coefficients determines a polynomial function on the real numbers. But we use the word 
polynomial to mean formal polynomial. 
The set of polynomials with coefficients in a ring R will be denoted by R[x). Thus Z[x] 
denotes the set of polynomials with integer coefficients — the set of integer polynomials. 
The monomials x' are considered independent. So if 


(11.2.2) 8(X) = bmx” + bmx" | 4+---+ bx + bo 


is another polynomial with coefficients in R, then f(x) and g(x) are equal if and only if 
qa; = b; foralli =0,1,2,.... 


* The degree of a nonzero polynomial, which may be denoted by deg f/f, is the largest integer 
n such that the coefficient a, of x, is not zero. A polynomial of degree zero is called a 
constant polynomial. The zero polynomial is also called a constant polynomial, but its degree 
will not be defined. 

The nonzero coefficient of highest degree of a polynomial is its leading coefficient, and 
a monic polynomial is one whose leading coefficient is 1. 


The possibility that some coefficients of a polynomial may be zero creates a nuisance. 
We have to disregard terms with zero coefficient, so the polynomial f(x) can be written 
in more than one way. This is irritating because it isn’t an interesting point. One way to 
avoid ambiguity is to imagine listing the coefficients of all monomials, whether zero or not. 
This allows efficient verification of the ring axioms. So for the purpose of defining the ring 
operations, we write a polynomial as 


(11.2.3) f(x) = ay tayxt+ayx?4+--- ; 
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where the coefficients a; are all in the ring R and only finitely many of them are different 
from zero. This polynomial is determined by its vector (or sequence) of coefficients a;: 


(11.2.4) a = (ao, a},...), 


where q; are elements of R, all but a finite number zero. Every such vector corresponds to a 
polynomial. 

When R is a field, these infinite vectors form the vector space Z with the infinite 
basis e; that was defined in (3.7.2). The vector e; corresponds to the monomial x!, and the 
monomials form a basis of the space of all polynomials. 


The definitions of addition and multiplication of polynomials mimic the familiar 
operations on polynomial functions. If f(x) and g(x) are polynomials, then with notation 
as above, their sum is . 


(11.2.5) f(x) + 8(%) = (Ao + bo) + (a + bixt-- = Lae t de x*, 
k 


where the notation (a; + b;) refers to addition in R. So if we think of a polynomial as a 
vector, addition is vector addition: a+ b = (ag + bo, 4 + b4,...). 
The product of polynomials f and g is computed by expanding the product: 


(11.2.6) Fx) 8%) = (ot ax t+-+-)(bo t+ bixt-++) = Yoaibjx't 


where the products a;bj are to be evaluated in the ring R. There will be finitely many 
nonzero coefficients a;b;. This is a correct formula, but the right side is not in the standard 
form (11.2.3), because the same monomial x” appears several times — once for each pair i, j 
of indices such that i + j = n. So terms have to be collected on the right side. This leads to 
the definition 


(11.2.7) F(x)8 (x) = pot Pixt pax? te, 
with PR= Do aibj, 
i+ jak 


Po=40bo, Py =aoby4+a1;bo, pr= agb2 + a,b; + azbo, .. 


Each p, is evaluated using the laws of composition in the ring. However, when making 
computations, it may be desirable to defer the collection of terms temporarily. 


Proposition 11.2.8 There is a unique commutative ring structure on the set of polynomials 
R[x] having these properties: 

e Addition of polynomials is defined by (11.2.5). 

* Multiplication of polynomials is defined by (11.2.7). 


* The ring R becomes a subring of Rix] when the elements of R are identified with 
the constant polynomials. 
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Since polynomial algebra is familiar and since the proof of this proposition has no interesting 
features, we omit it. O 


Division with remainder is an important operation on polynomials. 
Proposition 11.2.9 Division with Remainder. Let R be a ring, let f be a monic polynomial 


and let g be any polynomial, both with coefficients in R. There are uniquely determined 
polynomials g and r in R[x] such that 


g(x) = f(x)q(x) + r(x), 


and such that the remainder r, if it is not zero, has degree less than the degree of f. Moreover, 
f divides g in R[x] if and only if the remainder r is zero. 


The proof of this proposition follows the algorithm for division of polynomials that one 
learns in school. : Oo 
Corollary 11.2.10 Division with remainder can be done whenever the leading coefficient of 
f is a unit. In particular, it can be done whenever the coefficient ring is a field and f +0. 

If the leading coefficient is a unit u, we can factor it out of f. O 


However, one cannot divide x” + 1 by 2x + 1 in the ring Z[x] of integer polynomials. 


Corollary 11.2.11 Let g(x) be a polynomial in R[x], and let a be an element of R. The 
remainder of division of g(x) by x — a@ is g(@). Thus x — a divides g in R[x] if and only if 
g(a) = 0. 

This corollary is proved by substituting x = w into the equation g(x) = (x —@)q(x) + rand 
noting that r is a constant. O 


Polynomials are fundamental to the theory of rings, and we will also want to use 
polynomials in several variables. There is no major change in the definitions. 


* A monomial is a formal product of some variables x), ..., x, of the form 
Xytxgi? + Xn', 


where the exponents i, are non-negative integers. The degree of a monomial, sometimes 
called the total degree, is the sum 7; + --- + in. 


An n-tuple (i1,...,%,) is called a multi-index, and vector notation i = (4),..., in) 
for multi-indices is convenient. Using multi-index notation, we may write a monomial 
symbolically as x’: 


(11.2.12) x! = xy!1xq!2--- xy! 


The monomial x°, with 0 = (0,...,0), is denoted by 1. A polynomial in the variables 
X1,...,Xn, with coefficients in a ring R, is a linear combination of finitely many monomials, 
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with coefficients in R. With multi-index notation, a polynomial f(x) = f(%),-...,%n) can 
be written in exactly one way in the form 
(11.2.13) JOH=y ax, 
i 
where / runs through all multi-indices (i;,..., in), the coefficients aj are in R, and only 


finitely many of these coefficients are different from zero. 

A polynomial in which all monomials with nonzero coefficients have (total) degree d 
is called a homogeneous polynomial. 

Using multi-index notation, formulas (11.2.5) and (11.2.7) define addition and multi- 
plication of polynomials in several variables, and the analogue of Proposition 11.2.8 is true. 
However, division with remainder requires more thought. We will come back to it below 
(see Corollary 11.3.9). 

The ring of polynomials with coefficients in R is usually denoted by one of the symbols 


(11.2.14) Ri[x1,..-,Xn] or Rix], 


where the symbol x is understood to refer to the set of variables {x;, ..., X,}. When no set 
of variables has been introduced, R[x] denotes the polynomial ring in one variable. 


11.3 HOMOMORPHISMS AND IDEALS 


e Aring homomorphism gy: R — R’ is amap from one ring to another which is compatible 
with the laws of composition and which carries the unit element 1 of R to the unit element 1 
in R’ —a map such that, for all a and b in R, 


(11.3.1) y(a+b)=(a)+y(b), glab)=lag(b), and g(i1)=1. 
The map 
(11.3.2) ~g:Z—>Fp 


that sends an integer to its congruence class modulo p is a ring homomorphism. 

An isomorphism of rings is a bijective homomorphism, and if there is an isomorphism 
from R to R’, the two rings are said to be isomorphic. We often use the notation R ~ R’ to 
indicate that two rings R and R’ are isomorphic. 

A word about the third condition of (11.3.1): The assumption that a homomorphism @ 
is compatible with addition implies that it is a homomorphism from the additive group Rt 
of R to the additive group R’*. A group homomorphism carries the identity to the identity, 
so g(0) = 0. But we can’t conclude that g(1) = 1 from compatibility with multiplication, 
so that condition must be listed separately. (R is not a group with respect to X.) For example, 
the zero map R — R’ that sends all elements of R to zero is compatible with + and x, but 
it doesn’t send 1 to 1 unless 1 = 0 in R’. The zero map is not called a ring homomorphism 
unless R’ is the zero ring (see (11.1.5)). 

The most important ring homomorphisms are obtained by evaluating polynomials. 
Evaluation of real polynomials at a real number a@ defines a homomorphism 


(11.3.3) R[x] > R, thatsends p(x) ~ p(a). 
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One can also evaluate real polynomials at a complex number such as i, to obtain a 
homomorphism R[x] — C that sends p(x) ~» p(i). 


The general formulation of the principle of evaluation of polynomials is this: 


Proposition 11.3.4 Substitution Principle. Let g: R > R’ be aring homomorphism, and let 
R[x] be the ring of polynomials with coefficients in R. 


(a) Let a be anelement of R’. There is a unique homomorphism ®: R[x] > R’ that agrees 
with the map ¢ on constant polynomials, and that sends x ~» a. 

(b) More generally, given elements a1,...,@, of R’, there is a unique homomorphism 
®:R[x1,...,X%n] > R’, from the polynomial ring in n variables to R’, that agrees with 
g on constant polynomials and that sends x»~» a, for v=1,..., 7. 


Proof. (a) Let us denote the image g(a) of an element a of R by a’. Using the fact that 
® is a homomorphism that restricts to g on R and sends x to a@, we see that it acts on a 
polynomial f(x) = >° a;x' by sending 


(11.3.5) (J) -ajx') = S° O(a) O(x)' = Saja’. 


In words, ® acts on the coefficients of a polynomial as ¢, and it substitutes aw for x. Since this 
formula describes ®, we have proved the uniqueness of the substitution homomorphism. 
To prove its existence, we take this formula as the definition of ®, and we show that ® is a 
homomorphism R[x] > R’. It is clear that 1 is sent to 1, and it is easy to verify compatibility 
with addition of polynomials. Compatibility with multiplication is checked using formula 
(11.2.6): 


P( fg) = o(>- a;bj ad) = )F e(aibjx'*/) = V aibia!* 
ij 
= (Saja’) (Yea!) = O(N). 
J 


With multi-index notation, the proof of (b) becomes the same as that of (a). O 


Here is a simple example of the substitution principle in which the coefficient ring 
R changes. Let y%: R —> S be a ring homomorphism. Composing w with the inclusion of 
S as a subring of the polynomial ring S[x], we obtain a homomorphism g: R > S[x]. 
The substitution principle asserts that there is a unique extension of g to a homomorphism 
®: R[x] > S|[x] that sends x ~~ x. This map operates on the coefficients of a polynomial, 
while leaving the variable x fixed. If we denote y(a) by a’, then it sends a polynomial 
Anx" +++» +ajx +o to a,x" + +--+ a)x + ap. 

A particularly interesting case is that g is the homomorphism Z -—> Fp that sends an 
integer @ to its residue @ modulo p. This map extends to a homomorphism ®:Z[x] > Fp[x], 
defined by 


(11.3.6) F(X) = Agx” +++ +9 ~~ Gax" +++++G = f(x), 
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where @; is the residue class of a; modulo p. It is natural to call the polynomial f(x) the 
residue of f(x) modulo p. 

Another example: Let R be any ring, and let P denote the polynomial ring R[x]. One 
can use the substitution principle to construct an isomorphism 


(11.3.7) R(x, y] > Ply] = (RIx])[y]. 


This is stated and proved below in Proposition 11.3.8. The domain is the ring of polynomials 
in two variables x and y, and the range is the ring of polynomials in y whose coefficients 
are polynomials in x. The statement that these rings are isomorphic is a formalization of the 
procedure of collecting terms of like degree in y in a polynomial f(x, y). For example, 


syt+x°-3x7yty42 = ¥4+(4-3x) y+? +2). 


This procedure can be useful. For one thing, one may end up with a polynomial that is monic 
in the variable y, as happens in the example above. If so, one can do division with remainder 
(see Corollary 11.3.9 below). 


Proposition 11.3.8 Let x = (x),...,Xm) and y = ()1,.-.., Yn) denote sets of variables. 
There is a unique isomorphism R[x, y] > R[x][y], which is the identity on R and which 
sends the variables to themselves. 


This is very elementary, but it would be boring to verify compatibility of multiplication in 
the two rings directly. 


Proof. We note that since R is a subring of R[x] and R[x] is a subring of R[x][y], Ris alsoa 
subring of R[x][y]. Let g be the inclusion of R into R[x][y]. The substitution principle tells 
us that there is a unique homomorphism ®: R[x, y] > R[x][y], which extends g and sends 
the variables x,, and y, wherever we want. So we can send the variables to themselves. 
The map ® thus constructed is the required isomorphism. It isn’t difficult to see that ® is 
bijective. One way to show this would be to use the substitution principle again, to define 
the inverse map. O 


Corollary 11.3.9 Let f(x, y) and g(x, y) be polynomials in two variables, elements of 
R[x, y]. Suppose that, when regarded as a polynomial in y, f is a monic polyngmial 
of degree m. There are uniquely determined polynomials g(x, y) and r(x, y) such that 
g = fq+r, and such that if r(x, y) is not zero, its degree in the variable y is less than m. 


This follows from Propositions 11.2.9 and 11.3.8. O 


Another case in which one can describe homomorphisms easily is when the domain is 
the ring of integers. 


Proposition 11.3.10 Let R be a ring. There is exactly one homomorphism ¢:Z — R from 
the ring of integers to R. It is the map defined, for n > 0, by p(n) = 1+---+1 (n terms) 
and y(-n) = -y(n). 


Sketch of Proof. Let p:Z — R bea homomorphism. By definition of a homomorphism, 
y(1) = 1 and y(n + 1) = y(n) + g(1). This recursive definition describes g on the natural 
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numbers, and together with p(-n) =-g(n) ifn > 0 and g(0) = 0, it determines ¢ uniquely. 
So it is the only map Z > R that could be a homomorphism, and it isn’t hard to convince 
oneself that it is one. To prove this formally, one would go back to the definitions of addition 
and multiplication of integers (see Appendix). Oo 


Proposition (11.3.10) allows us to identify the image of an integer in an arbitrary ring R. 
We interpet the symbol 3, for example, as the element 1+1+1 of R. 


e Letgy:R— R’ bearing homomorphism. The kernel of ¢ is the set of elements of R that 
map to zero: 


(11.3.11) kerg={se R| g(s) = 0}. 


This is the same as the kernel obtained when one regards gy as a homomorphism of additive 
groups R+ ~» R’t. So what we have learned about kernels of group homomorphisms 
applies. For instance, ¢ is injective if and only if ker @ = {0}. 

As you will recall, the kernel of a group homomorphism is not only a subgroup, it 
is a normal subgroup. Similarly, the kernel of a ring homomorphism is closed under the 
operation of addition, and it has a property that is stronger than closure under multiplication: 


(11.3.12) If sisin ker gy, then for every element rof R, rsisin kerg. 


For, if g(s) = 0, then p(rs) = G(Ne(s) = g(r)0 = 0. 
This property is abstracted in the concept of an ideal. 


Definition 11.3.13 An ideal J of aring R is a nonempty subset of R with these properties: 


e J isclosed under addition, and 
e Ifsisin J and ris in R, thenrs is in J. 


The kernel of a ring homomorphism is an ideal. 

The peculiar term “ideal” is an abbreviation of the phrase ‘ideal element” that was 
formerly used in number theory. We will see in Chapter 13 how it arose. A good way, 
probably a better way, to think of the definition of an ideal is this equivalent formulation: 


T is not empty, and a linear combination 715; +---+ rps, 


(1.3.14) of elements s; of J with coefficients r; in Ris in J. 


¢ In any ring R, the multiples of a particular element a form an ideal called the principal 
ideal generated by a. An element b of R is in this ideal if and only if b is a multiple of a, 
which is to say, if and only if a divides b in R. 


There are several notations for this principal ideal: 
(11.3.15) (a)=aR = Ra = {ral re R}. 


The ring R itself is the principal ideal (1), and because of this it is called the unit ideal. 
It is the only ideal that contains a unit of the ring. The set consisting of zero alone is the 
principal ideal (0), and is called the zero ideal. An ideal J is proper if it is neither the zero 
ideal nor the unit ideal. 
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Every ideal / satisfies the requirements for a subring, except that the unit element 1 of 
R will not be in 7 unless J is the whole ring. Unless J is equal to R, it will not be what we call 
a subring. 


Examples 11.3.16 


(a) Let g be the homomorphism R[x] > R defined by substituting the real number 2 for x. 
Its kernel, the set of polynomials that have 2 as a root, can be described as the set of 
polynomials divisible by x — 2. This is a principal ideal that might be denoted by (x — 2). 


(b) Let ®:R[x, y] > R[t] be the homomorphism that is the identity on the real numbers, and 
that sends x 72, y+. Then it sends g(x, y) » g(t”, f°). The polynomial f(x, y) = 
yr x3 is in the kernel of ®. We’ll show that the kernel is the rincipal ideal (f) 
generated by f, ie. that if g(x, y) is a polynomial and if g(t ,P) = 0, then f 
divides g. To show this, we regard f as a polynomial in y whose coefficients are 
polynomials in x (see (11.3.8)). It is a monic polynomial in y, so we can do division 
with remainder: g = fq +r, where g and r are polynomials, and where the remainder 
r, if not zero, has degree at most 1 in y. We write the remainder as a polynomial in 
yir(x, y) =r1(x)y+ro(x). If g(t, t3) = 0, then both g and fq are in the kernel of ®, 
so ris too: r(t7, ) =r ,(t?)P + ro(t?) = 0. The monomials that appear in ro(t2) have 
even degree, while those in 7; (t7)t3 have odd degree. Therefore, in order for r(t2, f°) to 
be zero, ro(x) and r; (x) must both be zero. Since the remainder is zero, f divides g. O 


The notation (a) for a principal ideal is convenient, but it is ambiguous because the ring 
isn’t mentioned. For instance, (x — 2) could stand for an ideal of R[x] or of Z[x], depending 
on the circumstances. When several rings are being discussed, a different notation may be 
preferable. 


e The ideal J generated by a set of elements {a,, ..., @n} of a ring R is the smallest ideal that 
contains those elements. It can be described as the set of all linear combinations 


(11.3.17) r{Q, +++++TnGn 
with coefficients 7; in the ring. This ideal is often denoted by (a1, ...,@n): 
(11.3.18) (a1,..-,4n) = (ria, +++ +1nan | ri € R}. 


For instance, the kernel K of the homomorphism g: Z[x] > Fp, that sends f(x) to 
the residue of f(0) modulo p is the ideal (p, x) of Z[x] generated by p and x. Let’s check 
this. First, p and x are in the kernel, so (p, x) C K. To show that K C (p, x), we let 
F(x) = Gnx" +--+ +a 1x +a be an integer polynomial. Then f(0) = ao. If ag =0 modulo p, 
say dj = bp, then f is the linear combination bp + (anx"~! +.---+.4a4)x of pand x. So f 
is in the ideal (p, x). 

The number of elements required to generate an ideal can be arbitrarily large. 
The ideal (x3, x”y, xy, y’) of the polynomial ring C[x, y] consists of the polynomials 
in which every term has degree at least 3. It cannot be generated by fewer than four 
elements. 


In the rest of this section, we describe ideals in some simple cases. 
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Proposition 11.3.19 


(a) The only ideals of a field are the zero ideal and the unit ideal. 
(b) A ring that has exactly two ideals is a field. 


Proof. If an ideal J of a field F contains a nonzero element a, that element is invertible. 
Then J contains a~!a = 1, and is the unit ideal. The only ideals of F are (0) and (1). 

Assume that R has exactly two ideals. The properties that distinguish fields among 
rings are that 140 and that every nonzero element a of R has a multiplicative inverse. We 
have seen that 1 = 0 happens only in the zero ring. The zero ring has only one ideal, the zero 
ideal. Since our ring has two ideals, 140 in R. The two ideals (1) and (0) are different, so 
they are the only twoideals of R. 

To show that every nonzero element a of R has an inverse, we consider the principal 
ideal (a). It is not the zero ideal because it contains the element a. Therefore it is the unit 
ideal. The elements of (a) are the multiples of a, so 1 is a multiple of a, and therefore a is 
invertible. 0 


Corollary 11.3.20 Every homomorphism g: F > R from a field F to a nonzero ring R is 
injective. 


Proof. The kernel of ¢ is an ideal of F’. So according to Proposition 11.3.19, the kernel is 
either (0) or (1). If ker@ were the unit ideal (1), @ would be the zero map. But the zero 
map isn’t a homomorphism when R isn’t the zero ring. Therefore kerg = (0), and ¢ is 
injective. 


Proposition 11.3.21 The ideals in the ring of integers are the subgroups of Z*, and they are 
principal ideals. 


An ideal of the ring Z of integers will be a subgroup of the additive group Z*. It was proved 
before (2.3.3) that every subgroup of Zt has the form Zn. O 


The proof that subgroups of Z* have the form Zn can be adapted to the polynomial 
ring F[x]. 


Proposition 11.3.22 Every ideal in the ring F[x] of polynomials in one variable x over a 
field F is a principal ideal. A nonzero ideal J in F[x] is generated by the unique monic 
polynomial of lowest degree that it contains. 


Proof. Let I be an ideal of F[x]. The zero ideal is principal, so we may assume that / is not 
the zero ideal. The first step in finding a generator for a nonzero subgroup of Z is to choose 
its smallest positive element. The substitute here is to choose a nonzero polynomial f in J 
of minimal degree. Since F is a field, we may choose f to be monic. We claim that 7 is the 
principal ideal (f) of polynomial multiples of f. Since f isin /, every multiple of f isin /, 
so (f) C I. To prove that J C (f), we choose an element g of J, and we use division with 
remainder to write g = fq +r, where r, if not zero, has lower degree than f. Since g and f 
are in J, g — fg = risin I too. Since f has minimal degree among nonzero elements of /, 
the only possibility is that r = 0. Therefore f divides g, and g isin (f). 
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If f,; and f2 are two monic polynomials of lowest degree in J, their difference is in 7 
and has lower degree than n, so it must be zero. Therefore the monic polynomial of lowest 
degree is unique. O 


Example 11.3.23 Let y = \/2 be the real cube root of 2, and let ©: Q[x] > C be the 
substitution map that sends x ~» y. The kernel of this map is a principal ideal, generated by 
the monic polynomial of lowest degree in Q[x] that has y as a root (11.3.22). The polynomial 
x3 — 2 is in the kernel, and because 2 is not a rational number, it is not the product 
f = gh of two nonconstant polynomials with rational coefficients. So it is the lowest degree 
polynomial in the kernel, and therefore it generates the kernel. 

We restrict the map ® to the integer polynomial ring Z[x], obtaining a homomorphism 
®’: Z[x] > C. The next lemma shows that the kernel of ®’ is the principal ideal of Z[x] 
generated by the same polynomial /. 


Lemma11.3.24 Let f be amonic integer polynomial, and let g be another integer polynomial. 
If f divides g in Q[x], then f divides g in Z[x]. 


Proof. Since f is monic, we can do division with remainder in Z[x]: g = fq+pr. This 
equation remains true in the ring Q[x], and division with remainder in Q[x] gives the same 
result. In Q[x], f divides g. Therefore r = 0, and f divides g in Z[x]. 


The proof of the following corollary is similar to the proof of existence of the greatest 
common divisor in the ring of integers ((2.3.5), see also (12.2.8)). 


Corollary 11.3.25 Let R denote the polynomial ring F[x] in one variable over a field F, 
and let f and g be elements of R, not both zero. Their greatest common divisor d(x) is the 
unique monic polynomial that generates the ideal (f, g). It has these properties: 


(a) Rd = Rf + Reg. 

(b) d divides f and g. 

(c) Ifa polynomial e = e(x) divides both f and g, it also divides d. 

(d) There are polynomials p and g such thatd = pf + qg. O 


The definition of the characteristic of a ring R is the same as for a field. It is the 
non-negative integer m that generates the kernel of the homomorphism g:Z > R (11.3.10). 
If = 0, the characteristic is zero, and this means that no positive multiple of 1 in R is equal 
to zero. Otherwise n is the smallest positive integer such that “‘n times 1” is zero in R. The 
characteristic of a ring can be any non-negative integer. 


11.4 QUOTIENT RINGS 


Let J be an ideal of a ring R. The cosets of the additive subgroup 7+ of R* are the subsets 
a+. It follows from what has been proved for groups that the set of cosets R= R//isa 
group under addition. It is also a ring: 
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Theorem 11.4.1 Let J be an ideal of a ring R. There is a unique ring structure on the set 
R of additive cosets of J such that the map 7: R > R that sends a~»a@ = [a + J] is a ring 
homomorphism. The kernel of zr is the ideal J. 


As with quotient groups, the map 77 is referred to as the canonical map, and R is called 
the quotient ring. The image a of an element a is called the residue of the element. 


Proof. This proof has already been carried out for the ring of integers (Section 2.9). We 
want to put a ring structure on R, and if we forget about multiplication and consider only 
the addition law, J becomes a normal subgroup of Rt, for which the proof has been given 
(2.12.2). What is left to do is to define multiplication, to verify the ring axioms, and to prove 
that 2 is a homomorphism. Let a = [a + J] and b = [b +1] be elements of R. We would 
like to define the product by the setting ab = [ab + I]. The set of products 


P=(a+D(b+I) = ({rs|reat+l,seb+] 


isn’t always a coset of 7. However, as in the case of the ring of integers, P is always contained 
in the coset ab + J. If we writer =a+uands=b+ v with uw and vin J, then 


(a+u)(b+ v) =ab+(av+bu+t+uv). 


Since J is an ideal that contains u and v, it contains av + bu + wv. This is all that is needed 
to define the product coset: It is the coset that contains the set of products. That coset is 
unique because the cosets partition R. 

The proofs of the remaining assertions follow the patterns set in Section 2.9. 0 


As with groups, one often drops the bars over the letters that represent elements of a 
quotient ring R, remembering that “a = b in R” means @ = b. 


The next theorems are analogous to ones that we have seen for groups: 


Theorem 11.4.2 Mapping Property of Quotient Rings. Let f: R > R’ be a ring homomor- 
phism with kernel K and let J be another ideal. Let 7: R > R be the canonical map from 
RtoR=R/I. whe - 

(a) If 7C K, there is a unique homomorphism f:R —> R’ suchthat fa = f: 


R : R’ 
qt 
7 
Oe ca 
R=R/I 
(b) (First Isomorphism Theorem) If f is surjective and J = K, fisan isomorphism. OD 


The First Isomorphism Theorem is our fundamental method of identifying quotient 
rings. However, it doesn’t apply very often. Quotient rings will be new rings in most cn, and 
this is one reason that the quotient construction is important. The ring C[x, y]/ (y? — x3 +1), 
for example, is completely different from any ring we have seen up to now. Its elements are 
functions on an elliptic curve (see [Silverman]). 
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The Correspondence Theorem for rings describes the fundamental relationship be- 
tween ideals in a ring and a quotient ring. 


Theorem 11.4.3 Correspondence Theorem. Let g: R > R be a surjective ring homomor- 
phism with kernel K. There is a bijective correspondence between the set of all ideals of R 
and the set of ideals of R that contain K: 


{ideals of R that contain K} <—> {ideals of R}. 


This correspondence is defined as follows: 


e If J isa ideal of R and if K C J, the corresponding ideal of R is p(/). 
« If Zisa ideal of R, the corresponding ideal of Ris gp !(Z). 


If the ideal 7 of R corresponds to the ideal Z of FR, the quotient rings R/J and R/T are 
naturally isomorphic. 


Note that the inclusion K C / is the reverse of the one in the mapping property. 


Proof of the Correspondence Theorem. We let T be an ideal of R and we let J be an ideal 
of R that contains K. We must check the following points: 


¢ g(J) is an ideal of R. 

« gy }(Z) is an ideal of R, and it contains K. 
* oy '(Z)) =Z,and g1(p()) = 1. 

° If (J) = Z, then R/I~ R/T. 


We go through these points in order, referring to the proof of the Correspondence Theorem 
2.10.5 for groups when it applies. We have seen before that the image of a subgroup is a 
subgroup. So to show that ~(J) is an ideal of R, we need only prove that it is closed under 
multiplication by elements of R. Let 7 be in R and let x be in g( J). Then x = p(x) for some 
x in I, and because ¢ is surjective, 7 = g(r) for some r in R. Since J is an ideal, rx is in T, 
and 7X = (rx), sorx is in P(J). 

Next, we verify that g!(Z) is an ideal of R that contains K. This is true whether or 
not ¢ is surjective. Let’s write g(a) = @. By definition of the inverse image, ais in gL ) 
if and only if a is in Z. If ais in gy 1(Z) and ris in R, then y(ra) = Fa is in Z because TZ is 
an ideal, and hence ra is in gy 1(Z). The facts that go (I ) is closed under sums and that it 
contains K were shown in (2.10.4). 

The third assertion, the bijectivity of the correspondence, follows from the case of a 
group homomorphism. 

Finally, suppose that an ideal 7 of R that contains K corresponds to an ideal Z of R, 
that is, Z = g(J) and J = gy 1(Z). Let #:R > R/T be the canonical map, and let f denote 
the composed map 79: R > R > R/T. The kernel of f is the set of elements x in R such 
that 7y(x) = 0, which translates to g(x) € I, or tox € gy !(Z) =I. The kernel of f is J. 
The mapping property, applied to the map f, gives us a homomorphism f:R/ - > R/T, 
and the First Isomorphism Theorem asserts that f is an isomorphism. Oo 
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To apply the Correspondence Theorem, it helps to know the ideals of one of the rings. 
The next examples illustrate this in very simple situations, in which one of the two rings is 
C[t]. We will be able to use the fact that every ideal of C[?] is principal (11.3.22). 


Example 11.4.4 (a) Let g:C[x, y] > C[t] be the homomorphism that sends x ~~» t and 
y~ t*. This is a surjective map, and its kernel K is the principal ideal of C[x, y] generated 
by y — x”. (The proof of this is similar to the one given in Example 11.3.16.) 

The Correspondence Theorem relates ideals J of C[x, y] that contain y — x? to ideals 
J of C[#], by J = g() and J = g!(J). Here J will be a principal ideal, generated by 
a polynomial p(t). Let J; denote the ideal of C[x, y] generated by y — x? and p(x). 
Then J; contains K, and its image is equal to J. The Correspondence Theorem asserts 
that 7, = I. Every ideal of the polynomial ring C[x, y] that contains y — x” has the form 

= (y— x’, p(x)), for some polynomial p(x). 


(b) We identify the ideals of the quotient ring R’ = C[t]/(t? — 1) using the canonical 
homomorphism z : C[t] — R’. The kernel of z is the principal ideal (1 — 1). Let J be an 
ideal of C[r] that contains f? — 1. Then / is principal, generated by a monic polynomial f, 
and the fact that 2? — 1 is in J means that f divides 2 — 1. The monic divisors of t* — 1 are: 
1,¢-—1,t+1 and / — 1. Therefore the ring ‘R’ contains exactly four ideals. They are the 
principal ideals generated by the residues of the divisors of 2 — 1. O 


Adding Relations 


We reinterpret the quotient ring construction when the ideal / is principal, say J = (a). In 
this situation, we think of R = R// as the ring obtained by imposing the relation a = 0 
on R, or of killing the element a. For instance, the field IF) will be thought of as the ring 
obtained by killing 7 in the ring Z of integers. 

Let’s examine the collapsing that takes place in the map 7: R > R. Its kernel is the 
ideal J, so a is in the kernel: z(a) = 0. If b is any element of R, the elements that have the 
same image in R as b are those in the coset b + J, and since J = (a) those elements have 
the form b + ra. We see that imposing the relation a = 0 in the ring R forces us also to set 
b=b-+ra forall b andr in R, and that these are the only consequences of killing a. 

Any number of relations a, = 0,...,@, = 0 can be introduced, by working modulo 
the ideal J generated by aj, ..., Qn, the set of linear combinations r}a; + ---+7n@n, with 
coefficients r; in R. The quotient ring R = R/I is viewed as the ring obtained by killing the 
n elements. Two elements b and b’ of R have the same image in R if and only if b’ has the 
form b + ray +--+ +1rn@y for some r; in R. 

The more relations we add, the more collapsing takes place in the map z. If we add 
relations carelessly, the worst that can happen is that we may end up with 7 = R and R = 0. 
All relations a = 0 become true when we collapse R to the zero ring. 

Here the Correspondence Theorem asserts something that is intuitively clear: Intro- 
ducing relations one at a time or all together leads to isomorphic results. To spell this out, 
let a and b be elements of a ring R, and let R = R/(a) be the result of killing a in R. Let b 
be the residue of b in R. The Correspondence Theorem tells us that the principal ideal (b) 
of R corresponds to the ideal (a, b) of R, and that R/(a, b) is isomorphic to R/(b). Killing 
aand b in R at the same time gives the same result as killing bin the ring R that is obtained 
by killing a first. 
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Example 11.4.5 We ask to identify the quotient ring R = Z[i]/(i—2), the ring obtained from 
the Gauss integers by introducing the relation i — 2 = 0. Instead of analyzing this directly, 
we note that the kernel of the map Z[x] > Zi] sending x ~»7 is the principal ideal of Z[x] 
generated by f = x +1. The First Isomorphism Theorem tells us that Z[x]/() © Z[i]. The 
image of g = x — 2 isi —2,so Rcan also be obtained by introducing the two relations f = 0 
and g = 0 into the integer polynomial ring. Let I = (f, g) be the ideal of Z[x] generated by 
the two polynomials f and g. Then R*Z[x]/I. 

To form R, we may introduce the two relations in the opposite order, first killing g. 
then f. The principal ideal (g) of Z[x] is the kernel of the homomorphism Z[x] > Z that 
sends x ~» 2. So when we kill x — 2 in Z[x], we obtain a ring isomorphic to Z, in which the 
residue of x is 2. Then the residue of f = x? +1 becomes 5. So we can also obtain R by 
killing 5 in Z, and therefore R= Fs. 

The rings we have mentioned are summed up in this diagram: 


kill 
(11.4.6) Ax] ~—2>z 


kill kill 
x41 5 


Zhi 
(4 kill Fs 
i=2 DO 


11.5 ADJOINING ELEMENTS 


In this section we discuss a procedure closely related to that of adding relations: adjoining 
new elements to a ring. Our model for this procedure is the construction of the complex 
number field from the real numbers. That construction is completely formal: The complex 
number i has no properties other than its defining property: i2 = -1. We will now describe 
the general principle behind this construction. We start with an arbitrary ring R, and consider 
the problem of building a bigger ring containing the elements of R and also a new element, 
which we denote by a. We will probably want « to satisfy some relation such as a + 1 = 0. 
A ring that contains another ring as a subring is called a ring extension. So we are looking 
for a suitable extension. 

Sometimes the element a may be available in a ring extension R’ that we already know. 
In that case, our solution is the subring of R’ generated by R and aq, the smallest subring 
containing R and a. The subring is denoted by R[@]. We described this ring in Section 11.1 in 
the case R = Z, and the description is no different in general: R[@] consists of the elements 
B of R’ that have polynomial expressions 


B=rna?+---+rya+7r9 


with coefficients rj in R. 

But as happens when we construct C from R, we may not yet have an extension 
containing a. Then we must construct the extension abstractly. We start with the polynomial 
ring R[x]. Itis generated by R and x. The element x of satisfies no relations other than those 
implied by the ring axioms, and we will probably want our new element a@ to satisfy some 
relations. But now that we have the ring R[x] in hand, we can add relations to it using the 
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procedure explained in the previous section on the polynomial ring R[x). The fact that R is 
replaced by R[x] complicates the notation, but aside from this, nothing is different. 

For example, we construct the complex numbers by introducing the relation x? +1 = 0 
into the ring P = R[x] of real polynomials. We form the quotient ring P = P/(x* + 1), and 
the residue of x becomes our element 7. The raoe X* + 1 = Oholds in P because the map 
2: P — Pisahomomorphism and because x? + 1 is in its kernel. So P is isomorphic to C. 


In general, say that we want to adjoin an element @ to a ring R, and that we want @ to 
satisfy the polynomial relation f(x) = 0, where 


(11.5.1) f(%) = Gnx” + ay_1x" 1 +---+ayx+a9, with a; in R. 


The solution is R’ = R[x]/(f), where (f) is the principal ideal of R[x] generated by f. 
We let a denote the residue X of x in R’. Then because the map 7: R[x] > R[x]/(f/) 
is a homomorphism, 


(11.5.2) m(f(x)) = f(x) =Gna" +--- + =0. 


Here @; is the image in R’ of the constant polynomial a;. So, dropping bars, @ satisfies the 
relation f(a) = 0. The ring obtained in this way may be denoted by R[a] too. 

An example: Let a be an element of a ring R. An inverse of a is an element @ that 
satisfies the relation 


(11.5.3) aa—1=0. 


So wecan adjoin an inverse by forming the quotient ring R’ = R[x]/(ax — 1). 


The most important case is that our element @ is a root of a monic polynomial: 
(11.5.4) f(x) =x" tanix"*+---+a1x+a0, witha; in R. 
We can describe the ring R[a] precisely in this case. 


Proposition 11.5.5 Let Rbea ring, and let f(x) be a monic polynomial of positive degree n 
with coefficients in R. Let R[a] denote the ring R[x]/(f) obtained by adjoining an element 
satisfying the relation f(a) = 0. 

(a) Theset (1,a@,...,a@"71) isa basis of R[a] over R: every element of R[a] can be written 
uniquely as a linear combination of this basis, with coefficients in R. 

(b) Addition of two linear combinations is vector addition. 

(c) Multiplication of linear combinations is as follows: Let 6; and Bz be elements of R[a}, 
and let g(x) and g2(x) be polynomials such that 6; = g;(a) and By = g2(a). One 
divides the product polynomial g;g2 by f, say g1g2 = fq +r, where the remainder 
r(x), if not zero, has degree <n. Then f; 82 = r(a). 


The next lemma should be clear. 


Lemma 11.5.6 Let f be a monic polynomial of degree n in a polynomial ring R[x]. Every 
nonzero element of ( f) has degree at least n. Oo 
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Proof of the proposition. (a) Since R[a] is a quotient of the polynomial ring R[x], every 
element £ of R[q] is the residue of a polynomial g(x), i.e., B = g(a). Since f is monic, we 
can perform division with remainder: g(x) = f(x)q(x) + r(x), where r(x) is either zero or 
else has degree less than n (11.2.9). Then since f(a) = 0, 8 = g(a) = r(a). In this way, 8 is 
written as a combination of the basis. The expression for 8 is unique because the principal 
ideal (f) contains no element of degree <n. This also proves (c), and (b) follows from the 
fact that addition in R[x] is vector addition. Oo 


Examples 11.5.7 (a) The kernel of the substitution map Z[x] > C that sends x~» y = V2 
is the principal ideal (x? — 2) of Z[x] (11.3.23). So Z[y] is isomorphic to Z[x]/ (x? — 2). The 
proposition shows that (1, y, 7) is a Z-basis for Z[y]. Its elements are linear combinations 
ao + ay + a2y”, where qj; are integers. If Bj = (7% — y) and B2 = (y? +1), then 


Bib=av-Y+V-vafMiy-YD+tYt+yv-Dart+y-2. 


(b) Let R’ be obtained by adjoining an element 5 to Fs with the relation 6 — 3 = 0. Here 5 
becomes an abstract square root of 3. Proposition 11.5.5 tells us that the elements of R’ are 
the 25 linear expressions a + bé with coefficients a and b in Fs. 

We'll show that R’ is a field of order 25 by showing that every nonzero element a + bd 
of R’ isinvertible. To see this, consider the product c = (a+ b6)(a— b8) = (a? — 3b). This 
is is an element of Fs, and because 3 isn’t a square in Fs, it isn’t zero unless both a and b are 
zero. So if a + b& <0, c is invertible in Fs. Then the inverse of a + bd is (a — b8)c7!. 


(c) The procedure used in (b) doesn’t yield a field when itis applied to F,,. The reason is 
that Fy, already contains two square roots of 3, namely +5. If R’ is the ring obtained by 
adjoining 6 with the relation 5% —3 = 0, we are adjoining an abstract square root of 3, though 
F 1, already contains two square roots. At first glance one might expect to get Fy, back. We 
don’t, because we haven’t told 6 to be equal to 5 or -5. We’ve told 6 only that its square is 3. 
So 6 — Sand 6 + 5 are not zero, but (6 + 5)(6 — 5) = 6 —3 = 0. This cannot happen in a 
field. O 


It is harder to analyze the structure of the ring obtained by adjoining an element when 
_the polynomial relation isn’t monic. 


e There is a point that we have suppressed in our discussion, and we consider it now: 
When we adjoin an element @ to a ring R with some relation f(@) = 0, will our original 
R be a subring of the ring R’ that we construct? We know that R is contained in the 
polynomial ring R[x], as the subring of constant polynomials, and we also have the canonical 
map 7: R[x] ~ R’ = R[x]/(/). Restricting 2 to the constant polynomials gives us a 
homomorphism R —> R’, let’s call it y. Is w injective? If it isn’t injective, we cannot identify 
R with a subring of R’. 
The kernel of y is the set of constant polynomials in the ideal: 


(11.5.8) kerW= RO (ff). 


It is fairly likely that ker is zero because f will have positive degree. There will have to 
be a lot of cancellation to make a polynomial multiple of f have degree zero. The kernel 
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is zero when @ is required to satisfy a monic polynomial relation. But it isn’t always zero. 
For instance, let R be the ring Z/(6) of congruence classes modulo 6, and let f be the 
polynomial 2x + 1 in R[x]. Then 3f = 3. The kernel of the map R > R/(f) is not zero. 


11.6 PRODUCT RINGS 


The product G x G’ of two groups was defined in Chapter 2. Itisthe product set, and the law 
of composition is componentwise: (x, x’)(y, ¥) = (xy, x’y’). The analogous construction 
can be made with rings. 


Proposition 11.6.1 Let R and R’ be rings. 


(a) The product set RX R’ is a ring called the product ring, with component-wise addition 
and multiplication: 


(x x)+(Y)= ty, x+y) and (x,2')(y, Y) = (xy, 2'y), 


(b) The additive and muitiplicative identities in R X R’ are (0, 0) and (1, 1), respectively. 
(c) The projections 7: RX R’ > R and 2’: RX R’ > R' defined by x(x, x’) = x and 
m’(x, x’) = x’ are ring homomorphisms. The kernels of 2 and 7’ are the ideals {0} X R’ 
and R X {0}, respectively, of R x R’. 
(d) The kernel R x {0} of x’ is a ring, with multiplicative identity e = (1,0). It is not a 
subring of RX R’ unless R’ is the zero ring. Similarly, {0} x R’ is a ring with identity 
= (0, 1). It is not a subring of R x R’ unless R is the zero ring. 


The proofs of these assertions are very elementary. We omit them, but see the next 
proposition for part (d). O 

To determine whether or not a given ring is isomorphic to a product ring, one looks 
for the elements that in a product ring would be (1, 0) and (0,1). They are idempotent 
elements. 


¢ An idempotent element e of a ring S is an element of S such that e? = e. 


Proposition 11.6.2 Let e be an idempotent element of aring S. 


(a) The element e’ = 1 — e is also idempotent, e + e’ = 1, and ee’ = 0. 

(b) With the laws of composition obtained by restriction from S, the principal ideal eS is 
a ring with identity element e, and multiplication by e defines a ring homomorphism 
S—> eS. 

(c) The ideal eS is not a subring of S unless e is the unit element 1 of S and e’ = 0. 

(d) The ring S is isomorphic to the product ring eS X e’S. 


Proof. (a) e” = (1 -—e)? =1-2e +e =e’, andee’ =e(1-e) =e-e=0. 


(b) Every ideal J of a ring S has the properties of a ring except for the existence of a 
multiplicative identity. In this case, e is an identity element for eS, because if a is in eS, 
say a = es, then ea = e*s = es = a. The ring axioms show that multiplication by e is a 
homomorphism: e(a + b) = ea + eb, e(ab) = e2ab =.(ea)(eb), and el =e. 
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(c) To be a subring of S; eS must contain the identity 1 of S. If it does, then e and 1 will both 
be identity elements of eS, and since the identity in a ring is unique, e = 1 and e’ = 0. 


(d) The rule g(x) = (ex, e’x) defines a homomorphism g: S > eS X e’S, because both 
of the maps x ~» ex and x ~ e’x are homomorphisms and the laws of compostition in the 
product ring are componentwise. We verify that this homomorphism is bijective. First, if 
v(x) = (0, 0), then ex.= 0 and e’x = 0. If so, then x = (e + e’)x = ex + e’x = 0 too. This 
shows that ¢ is injective. To show that ¢ is surjective, let (u, v) be an element of eS Xe’S, 
sayu =exandv=e'y.Theng(u+v) = (e(ex+e’y), e'(ex+e'y)) = (u, v). So (u, v) is 
in the image, and therefore ¢ is surjective. 0 


Examples 11.6.3 (a) We go back to the ring R’ obtained by adjoining an abstract square 
root of 3 to F;. Its elements are the 11? linear combinations a + bd, with a and b in Fy and 
6° = 3. We saw in (11.5.7)(c) that this ring is not a field, the reason being that Fj; already 
contains two square roots +5 of 3. The elements e = 6 — 5 and e’ = -5 — 5 are idempotents 
in R’, and e + e’ = 1. Therefore R’ is isomorphic to the product eR’ X e’ R’. Since the order 
of R’ is 11”, |eR’| = |e’ R’| = 11. The rings eR’ and e’R’ are both isomorphic to Fj;, and R’ 
is isomorphic to the product ring Fy X Fy1. 


(b) We define a homomorphism ¢: C[x, y] > C[x] x C[y] from the polynomial ring in two 
variables to the product ring by g( f(x, y)) = (f(x, 0), f(O, y)). Its kernel is the set of 
polynomials f(x, y) divisible both by y and by x, which is the principal ideal of C[x, y] 
generated by xy. The map isn’t quite surjective. Its image is the subring of the product 
consisting of pairs (p(x), g(y)) of polynomials with the same constant term. So the quotient 
C[x, y]/(xy) is isomorphic to that subring. O 


11.7, FRACTIONS 


In this section we consider the use of fractions in rings other than the integers. For instance, 
a fraction p/q cé polynomials p and q, with q not zero, is called a rational function. 

Let’s review the arithmetic of integer fractions. In order to apply the statements below 
to other rings, we denote the ring of integers by the neutral symbol R. 


¢ A fraction is a symbol a/b, or ¢, where a and b are elements of R and b is not zero. 

e Elements of R are viewed as fractions by the rule a = a/1. 

¢ Two fractions a;/b, and az/b2 are equivalent, a;/b, ~a2/b2, if the elements of R 
that are obtained by “cross multiplying” are equal, i.e., if ajb2 = azb,. 


Sums and products of fractions are given b Ee + ease an epee 
* nd pr r regivenby —+—=———, -—=—. 
bod bd bd bd 


We use the term ‘“‘equivalent”’ in the third item because, strictly speaking, the fractions aren’t 
actually equal. 


A problem arises when one replaces the integers by an arbitrary ring R: In the 
definition of addition, the denominator of the sum is the product bd. Since denominators 
aren’t allowed to be zero, bd had better not be zero. Since b and d are denominators, they 
aren’t zero individually, but we need to know that the product of nonzero elements of R is 
nonzero. This turns out to be the only problem, but it isn’t always true. For example, in the 
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ring Z/(6) of congruence classes modulo 6, the classes 2 and 3 are not zero, but 2-3 = 0. Or, 
in a product Rx R’ of nonzero rings, the idempotents (1, 0) and (0, 1) are nonzero elements 
whose product is zero. One cannot work with fractions in those rings. 


¢ An integral domain R, or just a domain for short, is a ring with this property: R is not the 
zero ring, and if a and b are elements of R whose product ab is zero, then a = 0 or b = 0. 


Any subring of a field is a domain, and if R is a domain, the polynomial ring R[x] is also a 
domain. 

An element a of a ring is called a zero divisor if it is nonzero, and if there is another 
nonzero element b such that ab = 0. An integral domain is a nonzero ring which contains 
no zero divisors. 

An integral domain R satisfies the cancellation law: 


(11.7.1) Ifab=ac and a0, then b=c. 


For, from ab = ac it follows that a(b — c) = 0. Then since a#0 and since R is a domain, 
b-—c=0. oO 


Theorem 11.7.2 Let F be the set of equivalence classes of fractions of elements of an 
integral domain R. 


(a) With the laws defined as above, F is a field, called the fraction field of R. 
(b) R embeds asa subring of F by the rule a~a/1. 


(c) Mapping Property: If Ris embedded asa subring of another field F, the rulea/b = ab“! 
embeds F into F too. 


The phrase ‘“‘mapping property” is explained as follows: To write the property carefully, one 
should imagine that the embedding of R into F is given by an injective ring homomorphism 
y: R — F. The assertion is then that the rule B(a/b) = y(a)y(b) | extends gy to an 
injective homomorphism ®: F —> Ff. 

The proof of Theorem 11.7.2 has many parts. One must verify that what we call 
equivalence of fractions is indeed an equivalence relation, that addition and multiplication 
are well-defined on equivalence classes, that the axioms for a field hold, and that sending 
a~a/1isan injective homomorphism R + F. Then one must check the mapping property. 
All of these verifications are straightfoward. 

If we were the first people who wished to use fractions in a ring, we’d be nervous and 
would want to go carefully through each of the verifications. But they have been made many 
times. It seems sufficient to check a few of them to get a sense of what is involved. 

Let us check that equivalence of fractions is a transitive relation. Suppose that 
a/b; *=a@z/ bz and also that a2/b2~a3/b3 Then a1b2 = azb, and a2b3 = a3b2. We multiply 
by b3 and by: 

a,b2b3 = azb\b3 and a2b3b, = a3b2by. 


Therefore ajb2b3 = a3b7b,. Cancelling b2, a3b, = a ,b3. Thus a,/b, ~ a3/b3. Since we 
used the cancellation law, the fact that R is a domain is essential here. 

Next, we show that addition of fractions is well-defined. Suppose that a/b ~ a’/b'’ 
and c/d~=c'/d'. We must show that a/b + c/d~a'/b' + c’/d', and to do that, we cross 
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multiply the expressions for the sums. We must show that u = (ad + bc)(b’d’) is equal to 
v = (a’d’ + b'c')(bd). The relations ab’ = a’b and cd’ = c'd show that 


u = adb'd' + bcb'd' = a'dbd' + bc'b'd = v. 


Verification of the mapping property is routine too. The only thing worth remarking is 
that, if R is contained in F and if a/b is a fraction, then b 40, so the rule a/b = ab“! makes 
sense. 

As mentioned above, a fraction of polynomials is called a rational function, and the 
fraction field of the polynomial ring K[x], where K is a field, is called the field of rational 
functions in x, with coefficients in K. This field is usually denoted by K(x): 


(11.7.3) KGje equivalence classes of fractions f/g, where f and 2 
are polynomials, and g is not the zero polynomial 

The rational functions we define here are equivalence classes of fractions of the formal 

polynomials that were defined in Section 11.2. If K = R, evaluation of a rational function 

f(x)/g(x) defines an actual function on the real line, wherever g(x)+0. But as with 

polynomials, we should distinguish the formally defined rational functions, which are 

fractions of formal polynomials, from the functions that they define. 


11.8 MAXIMAL IDEALS 


In this section we investigate the kernels of surjective homomorphisms 
(11.8.1) g:R>F 


from a ring R toa field F. 

Let g be such a map. The field F has just two ideals, the zero ideal (0) and the unit 
ideal (1) (11.3.19). The inverse image of the zero ideal is the kernel J of ¢g, and the inverse 
image of the unit ideal is the unit ideal of R. The Correspondence Theorem tells us that the 
only ideals of R that contain J are J and R. Because of this, / is called a maximal ideal. 


¢ A maximal ideal M of aring R is an ideal that isn’t equal to R, and that isn’t contained in 
anyidealother than M and R: If anideal J contains M, then J = MorJ= R. 


Proposition 11.8.2 

(a) Let g: R > R’ be a surjective ring homomorphism, with kernel J. The image R’ is a 
field if and only if J is a maximal ideal. 

(b) Anideal J of aring R is maximal if and only if R = R/J isa field. 

(c) The zero ideal of a ring R is maximal if and only if R is a field. 


Proof. (a) A ring is a field if it contains precisely two ideals (11.3.19), so the Correspondence 
Theorem asserts that the image of g is a field if and only if there are two precisely ideals that 
contain its kernel J. This will be true if and only if J is a maximal ideal. 


Parts (b) and (c) follow when (a) is applied to the canonical map R > R/TJ. O 
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Proposition 11.8.3 The maximal ideals of the ring Z of integers are the principal ideals 
generated by prime integers. O 


Proof. Every ideal of Z is principal. Consider a principal ideal (m), with n > 0. Ifn isa 
prime, say n = p, then Z/(n) = Fp, a field. The ideal (7) is maximal. Ifn is not prime, there 
are three possibilities: » = 0, n = 1, or n factors. Neither the zero ideal nor the unit ideal 
is maximal. If n factors, say n = ab, with 1 <a <n, then 1 ¢ (a), a ¢ (n), and n €é€ (a). 
Therefore (n) < (a) < (1). The ideal (7) is not maximal. O 


¢ A polynomial with coefficients in a field is called irreducible if it is not constant and if is 
not the product of two polynomials, neither of which is a constant. 


Proposition 11.8.4 


(a) Let F be a field. The maximal ideals of F[x] are the principal ideals generated by the 
monic irreducible polynomials. 

(b) Let g: F[x] > R’ be a homomorphism to an integral domain R’, and let P be the kernel 
of g. Either P is a maximal ideal, or P = (0). 


The proof of part (a) is analogous to the proof just given. We omit the proof of (b). O 


Corollary 11.8.5 There is a bijective correspondence between maximal ideals of the 
polynomial ring C[x] in one variable and points in the complex plane. The maximal ideal 
M, that corresponds to a point a of C is the kernel of the substitution homomorphism 
Sq:C[x] + C that sends x ~<a. It is the principal ideal generated by the linear polynomial 
x—a. 


Proof. The kernel Mg of the substitution homomorphism Sg consists of the polynomials 
that have a as a root, which are those divisible by x — a. So Mg = (x — a). Conversely, let 
M be a maximal ideal of C[x]. Then M is generated by a monic irreducible polynomial. The 
monic irreducible polynomials in C[x] are the polynomials x — a. O 


The next theorem extends this corollary to polynomials rings in several variables. 


Theorem 11.8.6 Hilbert’s Nullstellensatz.1 The maximal ideals of the polynomial ring 
C[x1,..., Xn] are in bijective correspondence with points of complex n-dimensional space. 
A point a = (a,..., an) of C” corresponds to the kernel Mg of the substitution map 
Sq:C[x1,...,%Xn] > C that sends x;~»a;. The kernel Mg is generated by the n linear 
polynomials x; — aj. 


Proof. Let a be a point of C”, and let Mg be the kernel of sg. Since sg is surjective and since 
C isa field, Mg is a maximal ideal. To verify that Mg is generated by the linear polynomials 
as asserted, we first consider the case that the point a is the origin (0, ..., 0). We must show 
that the kernel of the map so that evaluates a polynomial at the origin is generated by the 
variables x1,..., Xn. Well, f(0,...,0) = Oif and only if the constant term of f is zero. If 
so, then every monomial that occurs in f is divisible by at least one of the variables, so f can 


lThe German word Nullstellensatz is a combination of three words whose translations are zero, places, theorem. 


x 
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be written as a linear combination of the variables, with polynomial coefficients. The proof 
for an arbitrary point a can be made using the change of variable x; = x; + a; to move a to 
the origin. 

It is harder to prove that every maximal ideal has the form Mg. Let M be a maximal 
ideal, and let F denote the field C[x1, ...,Xn]/M. We restrict the canonical map (11.4.1) 
w:C[x1,..-,Xn] ~ F tothe subring C[x,] of polynomials in in the first variable, obtaining 
a homomorphism ¢ :C[x;] — F. Proposition 11.8.4 shows that the kernel of ¢ is either the 
zero ideal, or one of the maximal ideals (x; — a,) of C[x,]. We’ll show that it cannot be the 
zero ideal. The same will be true when the index 1 is replaced by any other index, so M will 
contain linear polynomials of the form x; — a; for each i. This will show that M contains one 
of the ideals Ma, and since Mg is maximal, M will be equal to that ideal. 

In what follows, we drop the subscript from x;. We suppose that ker g = (0). Then 
y maps C[x] isomorphically to its image, a subring of ¥. The mapping property of fraction 
fields shows that this map extends to an injective map C(x) ~ Ff, where C(x) is the field of 
rational functions — the field of fractions of the polynomial ring C[x]. So F contains a field 
isomorphic to C(x). The next lemma shows that this is impossible. Therefore kerg+(0). 


Lemma 11.8.7 


(a) Let R be a ring that contains the complex numbers C as a subring. The laws of 
composition on R can be used to make R into a complex vector space. 

(b) As a vector space, the field F = C[x,,...,xn]/M is spanned by a countable set of 
elements. 

(c) Let V be a vector space over a field, and suppose that V is spanned by a countable set 
of vectors. Then every independent subset of V is finite or countably infinite. 

(d) When C(x) is made into a vector space over C, the uncountable set of rdtional functions 
(x — a)7!, with a in C, is independent. 


Assume that the lemma has been proved. Then (b) and (c) show that every independent set 
in ¥ is finite or countably infinite. On the other hand, F contains a subring isomorphic to 
C(x), so by (d), F contains an uncountable independent set. This is a contradiction. O 


Proof of the Lemma. (a) For addition, one uses the addition law in R. Scalar multiplication 
ca of an element a of R by an element c of C is defined by multiplying these elements in R. 
The axioms for a vector space follow from the ring axioms. 


(b) The surjective homomorphism 7: C[x;,...,xn] —~ F defines a map C > F, by means 
of which we identify C as a subring of F, and make F into a complex vector space. The 
countable set of monomials ost -»+x," forms a basis for C[x;,..., xn], and since 7 is 
surjective, the images of these monomials span ¥. 


(c) Let S be a countable set that spans V, say S = (v1, v2, ...}. It could be finite or infinite. 
Let S, be the subset (v;,..., vn) consisting of the first n elements of S, and let V, be the 
span of S,. If S is infinite, there will be infinitely many of these subspaces. Since S spans V, 
every element of V is a linear combination of finitely many elements of S, so it is in one of 
the spaces V,. In other words, |) V, = V. 
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Let L be an independent set in V, and let L, = LQ Vy. Then Ly, is a linearly 
independent subset of the space V,,, which is spanned by a set of n elements. So |Ly,| < 1 
(3.4.18). Moreover, L = |) L,, because V = J V,,. The union of countably many finite sets 
is finite or countably infinite. 


(d) We must remember that linear combinations can involve only finitely many vectors. So 
we ask: Can we have a linear relation 
R- e 

ye x ; = 0, 

v=1 = me 
where @1, ..., @x are distinct complex numbers andthe coefficients c, aren’t zero? No. Such 
a linear combination of formal rational functions defines a complex valued function except 
at the points x = qa. If the linear combination were zero, the function it defines would be 
identically zero. But (x ~ a ,)7! takes on arbitrarily large values near aj, while (x —a,)"! 
is bounded near a@; for v = 2,...,k. So the linear combination does not define the zero 
function. O 


11.9 ALGEBRAIC GEOMETRY 


A point (a;,...,@,) of C” is called a zero of a polynomial f(x1,..., Xn) of » variables 
if f(aq,...,@n) = 0. We also say that the polynomial f vanishes at such a point. The 
common zeros of aset { fi, ..., f} of polynomials are the points of C” at which all of them 
vanish — the solutions of the system of equations f; =--. = f, = 0. 


e A subset V of complex n-space C” that is the set of common zeros of a finite number of 
polynomials in n variables is called an algebraic variety, or just a variety. 


For instance, a complex line in the (x, y)-plane C? is, by definition, the set of solutions 
of a linear equation ax + by +c = 0. This is a variety. So is a point. The point (a, b) of C? 
is the set of common zeros of the two polynomials x ~ a and y — b. The group SL2(C) is a 
variety in C”_ It is the set of zeros of the polynomial +41x22 — x12*2 — 1. 


The Nullstellensatz provides an important link between algebra and geometry. It tells 
us that the maximal ideals in the polynomial ring C[x1,..., Xn] correspond to points in 
C”. This correspondence also relates algebraic varieties to quotient rings of the polynomial 
ring. 


Theorem 11.9.1 Let J be the ideal of C[x1,...,x,] generated by some polynomials 
fi, --- fy, and let R be the quotient ring C[x;,..., x,]/7. Let V be the variety of (common) 
zeros of the polynomials f,,..., f, in C”. The maximal ideals of R are in bijective 
correspondence with the points of V. 


Proof. The maximal ideals of R correspond to the maximal ideals of C[x1,..., xn] that 
contain J (Correspondence Theorem). An ideal of C[x,,..., x,] will contain / if and only 
if it contains the generators fj, ..., f; of 7. Every maximal ideal of the ring C[x1, ..., xn] 
is the kernel Mg of the substitution map that sends x; ~» a; for some point a = (aj, ..., Gn) 
of C”, and the polynomials f;,..., f; are in M, if and only if fi(a) = --- = f(a) = 0, 
which is to say, if and only if a is a point of V. Oo 
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As this theorem suggests, algebraic properties of the ring R = C[x]// are closely related 
to geometric properties of the variety V. The analysis of this relationship is the field of 
mathematics called algebraic geometry. 


A simple question one might ask about a set is whether or not it is empty. Is it possible 
for a ring to have no maximal ideals at all? This happens only for the zero ring. 


Theorem 11.9.2 Let R be a ring. Every ideal J of R that is not R itself is contained in a 
maximal ideal. 


To find a maximal ideal, one might try this procedure: If 7 is not maximal, choose a proper 
ideal 7’ that is larger than 7. Replace J by I’, and repeat. The proof follows this line of 
reasoning, but one may have to repeat the procedure many times, possibly uncountably 
often. Because of this, the proof requires the Axiom of Choice, or Zorn’s Lemma (see the 
Appendix). The Hilbert Basis Theorem, which we will prove later (14.6.7), shows that for 
most rings that we study, the proof requires only a weak countable version of the Axiom of 
Choice. Rather than enter into a discussion of the Axiom of Choice here, we defer further 
discussion of the proof to Chapter 14. Oo 


Corollary 11.9.3. The only ring R having no maximal ideals is the zero ring. 


This follows from the theorem, because every nonzero ring R contains an ideal different 
from R: the zero ideal. Oo 


Putting Theorems 11.9.1 and 11.9.2 together gives us another corollary: 


Corollary 11.9.4 Ifa system of polynomial equations f; =--- = f; = Oinn variables has 
no solution in C”, then 1 is a linear combination 1 = )> g; fj with polynomial coefficients g;. 


Proof. If the system has no solution, there is no maximal ideal that contains the ideal 
Il=(fi,..., fr). So J is the unit ideal, and 1 is in J. O 


Example 11.9.5 Most choices of three polynomials f,, fo, f3 in two variables have no 
common solutions. For instance, the ideal of C[t, x] generated by 


(11.9.6) fizlt+x?-2, fe=tx-1, fp=P+5tx?4+1 


is the unit ideal. This can be proved by showing that the equations fj = fo = f3 = 0 have 
no solution in C?. Oo 


It isn’t easy to get a clear geometric picture of an algebraic variety in C”, but the 
general shape of a variety in C* can be described fairly simply, and we do that here. We 
work with the polynomial ring in the two variables ¢ and x. 


Lemma 11.9.7 Let f(¢, x) be a polynomial, and let @ be a complex number. The following 
are equivalent: 
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(a) f(t, x) vanishes at every point of the locus {tf = a} in C?, 
(b) The one-variable polynomial f(a, x) is the zero polynomial, 
(c) t-—a@ divides f in C[f, x]. 


Proof. If f vanishes at every point of the locus t = a, the polynomial f(q@, x) is zero for 
every x. Then since a nonzero polynomial in one variable has finitely many roots, f(a, x) is 
the zero polynomial. This shows that (a) implies (b). 

A change of variable t = t/ + a reduces the proof that (b) implies (c) to the case that 
a = 0. If f(0, x) is the zero polynomial, then ¢ divides every monomial that occurs in f, and 
t divides f. Finally, the implication (c) implies (a) is clear. O 


Let F denote the field of rational functions C(‘) in ¢, the field of fractions of the ring 
C[t]. The ring C[t, x] is a subring of the one-variable polynomial ring F(x]; its elements are 
polynomials in x, 


(11.9.8) f(t, x) = an ()x" +---+a,Ox+a(n), 


whose coefficients a;(t) are rational functions in ¢. It can be helpful to begin by studying 
a problem about C[f, x] in the ring F[x], because its algebra is simpler. Division with 
remainder is available, and every ideal of F [x] is principal. 


Proposition 11.9.9 Let A(t, x) and f(t, x) be nonzero elements of C[t, x]. Suppose that h 
is not divisible by any polynomial of the form t — a. If h divides f in F(x], then h divides f 
in C[t, x]. 


Proof. We divide by h in F[x], say f = hq, and we show that q is an element of C[t, x]. 
Since q is an element of F [x], it is a polynomial in x whose coefficients are rational functions 
in t. We multiply both sides of the equation f = hq by a monic polynomial in ¢ to clear 
denominators in these coefficients. This gives us an equation of the form 


u(t) f(t, x) =A, x)qit, x), 


where u(t) is a monic polynomial in ¢, and q; is an element of C[t, x]. We use induction on 
the degree of u. If u has positive degree, it will have a complex root a. Then t — @ divides 
the left side of this equation, so it divides the right side too. This means that h(a, x)qi(q@, x) 
is the zero polynomial in x. By hypothesis, t — aw does not divide h, so h(a, x) is not zero. 
Since the polynomial ring C[x] is a domain, q;(q@, x) = 0, and the lemma shows that t —@ 
divides q, (t, x). We cancel t — aw from u and q;. Induction completes the proof. O 


Theorem 11.9.10 Two nonzero polynomials f(t, x) and g(t, x) in two variables have 
only finitely many common zeros in C?, unless they have a common nonconstant factor 


in C[e, x]. 


If the degrees of the polynomials f and g are m and n respectively, the number 
of common zeros is at most mn. This is known as the Bézout bound. For instance, two 
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quadratic polynomials have at most four common zeros. (The analogue of this statement for 
real polynomials is that two conics intersect in at most four points.) It is harder to prove the 
Bézout bound than the finiteness. We won’t need that bound, so we won’t prove it. 


Proof of Theorem 11.9.10. Assume that f and g have no common factor. Let J denote the 
ideal generated by f and g in F[x], where F = C(#), as above. This is a principal ideal, 
generated by the (monic) greatest common divisor h of f and g in F [x]. 

If h#1, it will be a polynomial whose coefficients may have denominators that are 
polynomials in tf. We multiply by a polynomial in f to clear these denominators, obtaining 
a polynomial /, in C[t, x]. We may assume that hy isn’t divisible by any polynomial t — a. 
Since the denominators are units in F and since h divides f and g in F [x], h, also divides 
f and g in F[x]. Proposition 11.9.9 shows that h; divides f and g in C[t, x]. Then f and g 
have acommon nonconstant factor in C[t, x]. We’re assuming that this is not the case. 

So the greatest common divisor of f and g in F[x] is 1, and 1 =r/f + sg, where r and 
s are elements of F(x]. We clear denominators from r and s, multiplying both sides of the 
equation by a suitable polynomial u(t). This gives us an equation of the form 


u(t) =r(t, x) fG x) +510, x) gt, x), 


where all terms on the right are polynomials in C[t, x]. This equation shows that if (fo, xo) 
is acommon zero of f and g, then fg must be a root of u. But u is a polynomial in t, and 
a nonzero polynomial in one variable has finitely many roots. So at the common zeros of 
f and g, the variable t takes on only finitely many values. Similar reasoning shows that 
x takes on only finitely many values. This gives us only finitely many possibilities for the 
common zeros. 0 


Theorem 11.9.10 suggests that the most interesting varieties in C? are those defined as 
the locus of zeros of a single polynomial f(t, x). 


¢ The locus X of zeros in C? of a polynomial f(t, x) is called the Riemann surface of f. 


It is also called a plane algebraic curve — a confusing phrase. As a topological space, the 
locus X has dimension two. Calling it an algebraic curve refers to the fact that the points 
of X depend only on one complex parameter. We give a rough description of a Riemann 
surface here. Let’s assume that the polynomial f is irreducible — that it is not a product of 
two nonconstant polynomials, and also that it has positive degree in the variable x. Let 


(11.9.11) X ={(t,x) eC’ | f(t,x) =0) 


be its Riemann surface, and let T denote the complex f-plane. Sending (t, x) ~» t defines a 
continuous map that we call a projection 


(11.9.12) mX—3T. 


We will describe X in terms of this projection. However, our description will require that a 
finite set of ‘bad points” be removed from X. In fact, what is usually called the Riemann 
surface agrees with our definition only when suitable finite subsets are removed. The locus 
{f = 0} may be “‘singular’’ at some points, and some other points of X may be “‘at infinity.” 
The points at infinity are explained below (see (11.9.17)). 
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The simplest examples of singular points are nodes, at which the surface crosses itself, 
and cusps. The locus x? = ¢? — ?” has a node at the origin, and the locus x” = f° has a cusp 
at the origin. The real points of these Riemann surfaces are shown here. 


a node a cusp 


(11.9.13) Some Singular Curves 


To avoid repetition of the disclaimer ‘“‘except on a finite set,” we write X’ for the 
complement of an unspecified finite subset of X, which is allowed to vary. Whenever 
a construction runs into trouble at some point, we simply delete that point. Essentially 
everything we do here and when we come back to Riemann surfaces in Chapter 15 will be 
valid only for X’. We keep X on hand for reference. 

Our description of the Riemann surface will be as a branched covering of the complex 
t-plane T. The definition of covering space that we give here assumes that the spaces are Haus- 
dorff spaces ([Munkres] p. 98). You can ignore this point if you don’t know what it means. 
The sets in which we are interested are Hausdorff spaces because they are subsets of C?. 


Definition 11.9.14 Let X and T be Hausdorff spaces. A continuous map 7: X — T is an 
n-sheeted covering space if every fibre consists of n points, and if it has this property: Let 
Xo be a point of X and let (xo) = to. Then 2 maps an open neigborhood U of xo in X 
homeomorphically to an open neighborhood V of fg in T. 


A map xz from X to the complex plane T is an n-sheeted branched covering if X contains 
no isolated points, the fibres of zr are finite, and if there is a finite set A of points of T called 
branch points, such that the map (X — 2"!A) > (T ~ A) is an n-sheeted covering space. 
For emphasis, a covering space is sometimes called an unbranched covering. 


Figure 11.9.15 below depicts the Riemann surface of the polynomial x? — t, a two- 
sheeted covering of T that is branched at the point t = 0. The figure has been obtained by 
writing ¢ and x in terms of their real and imaginary parts, t = fo + 4i and x = x9 + x1i, 
and dropping the imaginary part x; of x, to obtain a surface in three-dimensional space. Its 
further projection to the plane is depicted using standard graphics. 

The projected surface intersects itself along the negative fo-axis, though the Riemann 
surface itself does not. Every negative real number ¢ has two purely imaginary square roots. 
The real parts of these square roots are zero, and this produces the self-crossing in the 
projected surface. 
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L 1(U) 


V2 


V3 


Part of an unbranched covering. 


(1.9.14) 


Xo 


Z 


Y 


iy 


N 
N 


SS 


2 


The Riemann surface x2 = t. 


(1.9.15) 
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Given a branched covering X —> T, we refer to the points in the set A as its branch 
points, though this is imprecise: The defining property continues to hold when we add any 
finite set of points to A. So we allow the possibility that some points of A don’t need to be 
included - that they aren’t “true” branch points. 


Theorem 11.9.16 Let f(t, x) be an irreducible polynomial in C[t, x] which has positive 
degree n in the variable x. The Riemann surface of f is an n-sheeted branched covering of 
the complex plane 7. 


Proof. The main step is to verify the first condition of (11.9.14), that the fibre 27! (fg) consists 
of precisely n points except on a finite subset A. 

The points of the fibre mz! (to) are the points (to, x9) such that x9 is a root of the 
one-variable polynomial f(tg, x). We must show that, except for a finite set of values t = fo, 
this polynomial has n distinct roots. We write f(t, x) as a polynomial in x whose coefficients 
are polynomials in ft, say f(x) = a,(t)x” +---+ a(t), and we denote aj(to) by ae, The 
polynomial f(t, x) = a®x” teee t+ a°x + ae has degree at most n, so it has at most n roots. 
Therefore the fibre x7!(p) contains at most n points. It will have fewer than n points if 
either 


(11.9.17) 


(a) the degree of f(to, x) is less than n, or 
(b) (to, x) has a multiple root. 


The first case occurs when fo is a root of a,,(t). (If to is a root of a, (t), one of the roots 
of f(t), x) tends to infinity as tf; — fo.) Since a, (t) is a polynomial, there are finitely many 
such values. 

Consider the second case. A complex number Xo is a multiple root of a polynomial 
h(x) if (x — x9)? divides h(x), and this happens if and only if x9 is a common root of A(x) 
and its derivative h’(x) (see Exercise 3.5). Here h(x) = f(to, x). The first variable is fixed, 
so the derivative is the partial derivative af Going back to the polynomial f(t, x) in two 
variables, we see that the second case occurs at the points (to, xo) that are common zeros of 
f and af Now f cannot divide its partial derivative, which has lower degree in x. Since f is 


assumed to be irreducible, f and af have no common nonconstant factor. Theorem 11.9.10 
tells us that there are finitely many common zeros. 

We now check the second condition of (11.9.14). Let tg be a point of T such that the 
fibre 2°! (tg) consists of n points, and let (tg, x9) be a point of X in the fibre. Then XQ is 
a simple root of f(fo, x), and therefore a is not zero at this point. The Implicit Function 
Theorem A.4.3 implies that one can solve for x as a function x(f) of ¢t in a neighborhood of 
to, such that x(to) = xo. The neighborhood U referred to in the definition of covering space 
is the graph of this function. 


To me algebraic geometry is algebra with a kick. 


—Solomon Lefschetz 
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EXERCISES 


Section 1 Definition of a Ring 


1.1. 
1.2. 
1.3. 


1.4. 


1.5. 
1.6. 


1.7. 


1.8. 
19. 


Prove that 7 + V2 and /3 + V-S are algebraic numbers. 

Prove that, for n #0, cos(27/n) is an algebraic number. 

Let Q[a, B] denote the smallest subring of C containing the rational numbers Q and the 
elements a = /2 and B = V3. Let y=atf. Is Qla, 6] = Q[y]? Is Z[a, 6] = Z[y]? 
Let a = hi. Prove that the elements of Z[a] are dense in the complex plane. 

Determine all subrings of R that are discrete sets. 

Decide whether or not S is a subring of R, when 


(a) Sis the set of all rational numbers a/b, where b is not divisible by 3, and R = Q, 


(b) Sis the set of functions which are linear combinations with integer coefficients of the 
functions {1, cosnt, sinnt}, nm € Z, and Ris the set of all real valued functions of ¢. 


Decide whether the given structure forms a ring. If it is not a ring, determine which of the 
ring axioms hold and which fail: 


(a) U is an arbitrary set, and R is the set of subsets of U. Addition and multiplication of 
elements of R are défined by the rules A+ B = (AUB) —(ANB)andA-B=ANB. 

(b) R is the set of continuous functions R > R. Addition and multiplication are defined 
by the rules [f + g](x) = f(x) + g(x) and [fo g](x) = f(g). 

Determine the units in: (a) Z/12Z, (b) Z/8Z, (c) Z/nZ. 


Let R be a set with two laws of composition satisfying all ring axioms except the 
commutative law for addition. Use the distributive law to prove that the commutative law 
for addition holds, so that R is a ring. 


Section2 Polynomial Rings 


2.1. 
2.2. 


For which positive integers n does x? + x +1 divide x4 + 3x3 + x* +7x +5 in [Z/(n)][x]? 
Let F be a field. The set of all formal power series p(t) = ap + ayt + apt? +--+, with a; 
in F, forms a ring that is often denoted by F|[¢]]. By formal power series we mean that 
the coefficients form an arbitrary sequence of elements of F’. There is no requirement of 
convergence. Prove that F[[t]] is a ring, and determine the units in this ring. 


Section 3 Homomorphisms and Ideals 


3.1. 
3.2. 
3.3. 


Prove that an ideal of a ring R is a subgroup of the additive group Rt. 

Prove that every nonzero ideal in the ring of Gauss integers contains a nonzero integer. 
Find generators for the kernels of the following maps: 

(a) R[x, y] > R defined by f(x, y) ~ f(0, 9), 

(b) R[x] > C defined by f(x) ~ f2+4+i), 

(c) Z[x] > R defined by f(x) ~» f+ V2), 


3.4. 


3.5. 


3.6. 


3.7, 
3.8. 


3.9. 


3.10. 


3.11. 


3.12. 


3.13. 
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(d) Z[x] > C defined by x» V2 + V3. 
(e) C[x, y, z] > C[d] defined by x t,y 2,278. 


Let g:C[x, y] — C[t] be the homomorphism that sends x ~» t+1 and y ~» t? —1. Determine 
the kernel K of g, and prove that every ideal J of C[x, y] that contains K can be generated 
by two elements. 


The derivative of a polynomial! f with coefficients in a field F is defined by the calculus 
formula (anx” + +++ + ajx + ao)’ = nayx"-! +... + 1a;. The integer coefficients are 
interpreted in F using the unique homomorphism Z > F. 


(a) Prove the product rule (fg)’ = f’g + fg’ and the chain rule (fo g) = (f’0 g)g’. 


(b) Let @ be anelement of F’. Prove that @ is a multiple root of a polynomial f if and only 
if it is acommon root of f and of its derivative /’. 


An automorphism of a ring R is an isomorphism from R to itself. Let R be a ring, 
and let f(y) be a polynomial in one variable with coefficients in R. Prove that the map 
R[x, y] > R[x, y] defined by x x+ f(y), y~ yis an automorphism of R[x, y]. 


Determine the automorphisms of the polynomial! ring Z[x] (see Exercise 3.6). 


Let R be aring of prime characteristic p. Prove thatthemap R — R defined by x ~» x? is 
a ring homomorphism. (It is called the Frobenius map.) 


(a) An element x of a ring R is called nilpotent if some power is zero. Prove that if x is 
nilpotent, then 1 + x is a unit. 


(b) Suppose that R has prime characteristic p40. Prove that if a is nilpotent then 1 +a is 
unipotent, that is, some power of 1 + a is equal to 1. 


Determine all ideals of the ring F'[[t]] of formal power series with coefficients in a field F 
(see Exercise 2.2). 


Let R be a ring, and let / be an ideal of the polynomial ring R[x]. Let n be the lowest 
degree among nonzero elements of J. Prove or disprove: J contains a monic polynomial of 
degree n if and only if it is a principal ideal. 


Let J and J be ideals of aring R. Prove that the set J + J of elements of the form x + y, 
with x in J and yin J, is an ideal. This ideal is called the sum of the ideals J and J. 


Let J and J be ideals of aring R. Prove that the intersection 7M J is an ideal. Show by 
example that the set of products {xy | x € I, y € J} need not be an ideal, but that the set 
of finite sums )° xyyz of products of elements of J and J is an ideal. This ideal is called 
the product ideal, and is denoted by IJ. Is there arelation between JJ and 1M J? 


Section4 Quotient Rings 


4.1. 


4.2. 
4.3. 


4.4, 


Consider the homomorphism Z[x] — Z that sends x ~» |. Explain what the Correspon- 
dence Theorem, when applied to this map, says about ideals of Z[x]. 


What does the Correspondence Theorem tell us about ideals of Z[x] that contain x? + 17 


Identify the following rings: (a) Z[x]/ (x? —3,2x +4), (b) Z[i]/(2+ 3), 
(c) Z[x]/(6,2x —1), (d) Z[x]/(2x* — 4, 4x —5), (e) Z[x]/(x* +3, 5). 


Are the rings Z[x]/(x? + 7) and Z[x]/(2x* + 7) isomorphic? 
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Section S Adjoining Elements 


5.1, 


5.2. 


5.3. 
5.4. 


5.5. 
5.6. 


5.7. 


Let f =x44+x3 +x? +x+4+1 and let a denote the residue of x in the ring R = Z[x]/(f). 
Express (a? + a? + «)(a° +1) in terms of the basis (1, a, a, a3) of R. 


Let a be an element of a ring R. If we adjoin an element q@ with the relation a = a, we 
expect to get aring isomorphic to R. Prove that this is true. 

Describe the ring obtained from Z/12Z by adjoining an inverse of 2. 

Determine the structure of the ring R’ obtained from Z by adjoining an element a satisfying 
each set of relations. 

(a) 2a = 6, 6a = 15, (b) 2a-6=0,a-10=0, (Jar +a7+1=0,a7 +a=0. 

Are there fields F such that the rings F[x]/(x?) and F[x]/(x? — 1) are isomorphic? 


Let a be an element of a ring R, and let R’ be the ring R[x]/ (ax — 1) obtained by adjoining 
an inverse of a to R. Let a denote the residue of x (the inverse of a in R’). 


(a) Show that every element f of R’ can be written in the form 6 = ab, with bin R. 
(b) Prove that the kernel of the map R > R’ is the set of elements b of R such that 
a"b =O0forsomen > 0. 


(c) Prove that R’ is the zero ring if and only if a is nilpotent (see Exercise 3.9). 


Let F be a field and let R = F[t] be the polynomial ring. Let R’ be the ring extension 
R[x]/ (tx — 1) obtained by adjoining an inverse of ¢ to R. Prove that this ring can be 
identified as the ring of Laurent polynomials, which are finite linear combinations of 
powers of t, negative exponents included. 


Section 6 Product Rings 


6.1, 


6.2. 


6.3. 
6.4. 


6.5. 


6.6. 
6.7. 


6.8. 


Let gy: R[x] ~ C XC be the homomorphism defined by g(x) = (1, i) and g(r) = (7, r) 
for rin R. Determine the kernel and the image of ¢. 

Is Z/(6) isomorphic to the product ring Z/(2) x Z/(3)? Is Z/(8) isomorphic to Z/(2) x 
Z/(4)? 

Classify rings of order 10. 

In each case, describe the ring obtained from the field F2 by adjoining an element a 
satisfying the given relation: 

(aja? +a+1=0, (b)a*+1=0, ()a*+a=0. 


Suppose we adjoin an element a satisfying the relation w? = 1 to the real numbers R. 
Prove that the resulting ring is isomorphic to the product R X R. 


Describe the ring obtained from the product ring R X R by inverting the element (2, 0). 


Prove that in the ring Z[x], the intersection (2) M (x) of the principal ideals (2) and (x) 
is the principal ideal (2x), and that the quotient ring R = Z[x]/(2x) is isomorphic to the 
subring of the product ring F2[x] x Z of pairs (f(x), n) such that f(0) =n modulo 2. 


Let J and J be ideals of aring Rsuchthat7+ J=R. 


(a) Prove that /J = IM J (see Exercise 3.13). 


(b) Prove the Chinese Remainder Theorem: For any pair a, b of elements of R, there is an 
element x such that x=a modulo J and x= b modulo J. (The notation x=a modulo 
I means x —aé I.) 
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(c) Prove that if JJ = 0, then R is isomorphic to the product ring (R/D X (R/ J). 
(d) Describe the idempotents corresponding to the product decomposition in (c). 


Section 7 Fractions 


7.1. 


7.2. 


7.3, 
7.4. 


7.5. 


Prove that a domain of finite order is a field. 

Let R be a domain. Prove that the polynomial ring R[x] is a domain, and identify the units 
in R[x]. . 

Is there a domain that contains exactly 15 elements? 

Prove that the field of fractions of the formal power series ring F'[[x}] over a field F can be 


obtained by inverting the element x. Find a neat description of the elements of that field 
(see Exercise 11.2.1). 


A subset S of a domain R that is closed under multiplication and that does not contain Ois | 
called a multiplicative set. Given a multiplicative set S, define S-fractions to be elements of 
the form a/b, where b is in S. Show that the equivalence classes of S-fractions form a ring. 


Section 8 Maximal Ideals 


8.1. 
8.2. 


8.3. 
8.4, 


Which principal ideals in Z[x] are maximal ideals? 

Determine the maximal ideals of each of the following rings: 

(a) RXR, (b) R[x]/(x?), (© R[x]/G? —3x+2), @ Ri[x]/O2+x41). 

Prove that the ring F)[x]/(3 + x + 1) isa field, but that F3[x]/(x? + x + 1) is nota field. 


Establish a bijective correspondence between maximal ideals of R[x] and points in the 
upper half plane. 


Section9 Algebraic Geometry 


9.1, 


9.2. 


9.3. 


9.4, 


9.5. 


9.6. 
9.7. 


Let / be the principal ideal of C[x, y] generated by the polynomial y*+x?—17. Which of the 
following sets generate maximal ideals in the quotient ring R = C[x, y]/J? (x—-1, y—4), 
(x+1,y+4), (x3 -17, y?). 

Let f,,.-.., f- be complex polynomials in the variables x;,..., Xn, let V be the variety 
of their common zeros, and let J be the ideal of the polynomial ring R = C[x1,..., xn] 
that they generate. Define a homomorphism from the quotient ring R = R/T/ to the ring 
R of continuous, complex-valued functions on V. 

Let U = {fi(r1,.-.,%m) = 0), V = {9jO1, ---, Yn) = 0} be varieties in C” and C”, 
respectively. Show that the variety defined by the equations { fj(x) = 0, g;(y) = 0} in 
x, y-space C”* is the product set U x V. 

Let U and V be varieties in C”. Prove that the union U U V and the intersection UN V 
are varieties. What does the statement UN V = 9 mean algebraically? What about the 
statement UU V = C"? 

Prove that the variety of zeros of a set { f1,..., fr} of polynomials depends only on the 
ideal that they generate. 


Prove that every variety in C? is the union of finitely many points and algebraic curves. 
Determine the points of intersection in C? of the two loci in each of the following cases: 
(a) YW — 422 =1, xt+y=1, (b)x*+ey+ YY =1, 2 4+2y =1, 

(c) Y =x3, xy=1, @xt+y=0, y+ x2 4+2xy?+y4 =0. 
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9.8. 
9.9. 


9.10. 


9.11. 


9.12. 


*9,13, 
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Which ideals in the polynomial ring C[x, y] contain x? + y? ~ 5 and xy — 2? 


An irreducible plane algebraic curve C is the locus of zeros in C? of an irreducible 
polynomial f(x, y). A point p of C is a singular point of the curve if f = df/dx = 
df/dy = 0 at p. Otherwise p is a nonsingular point. Prove that an irreducible curve has 
only finitely many singular points. 

Let L be the (complex) line {ax + by +c = 0} in C’, and let C be the algebraic curve 


{ f(x, y) = 0}, where f is an irreducible polynomial of degree d. Prove CM L contains at 
most d points unless C = L. 


Let Cy and C2 be the zeros of quadratic polynomials f, and f2 respectively that don’t 
have a common linear factor. ’ 

(a) Let p and g be distinct points of intersection of Cy and C2, and let L be the (complex) 
line through p and q. Prove that there are constants c, and c2, not both zero, so that 
2 = cy fi + Co fo vanishes identically on L. Prove also that g is the product of linear 
polynomials. 

Hint: Force g to vanish at a third point of L. 

(b) Prove that C; and C2 have at most 4 points in common. 


Prove in two ways that the three polynomials fj = ?+x?-2, fy = tx~1, fg =P+5tx?+1 
generate the unit ideal in C[x, y]: by showing that they have no common Zeros, and also 
by writing 1 as a linear combination of fj, fo, 3, with polynomial coefficients. 

Let g : C[x, y] — C[t] be a homomorphism that is the identity on C and sends x ~» x(t), 


yw~ y(t), and such that x(t) and y(#) are not both constant. Prove that the kernel of g is a 
principal ideal. 


Miscellaneous Exercises 


M.1. 
M.2. 


*ML7. 


Prove or disprove: If a2 = a for every a in a nonzero ring R, then R has characteristic 2. 


A semigroup 5S is a set with an associative law of composition having an identity element. 
Let S be a commutative semigroup that satisfies the cancellation law: ab = ac implies 
b= c.Prove that S can be embedded into a group. 


. Let R denote the set of sequences a = (ay, a2, a3, ...) of real numbers that are eventually 


constant: ay = Gn4, = ... for sufficiently large mn. Addition and multiplication are 
componentwise, that is, addition is vector addition and multiplication is defined by 
ab = (a,b, azb2, ...). Prove that R is a ring, and determine its maximal ideals. 


. (a) Classify rings R that contain C and have dimension 2 as vector space over C. 


(b) Do the same for rings that have dimension 3. 


. Define g:C[x, y] > C[x] XC[y] xC[t] by fox, y) ~ (f(x, 0), fO, y), f(t, D). Determine 


the image of this map, and find generators for the kernel. 


. Prove that the locus y = sin x in R? doesn’t lie on any algebraic curve in C?, 


Let X denote the closed unit interval [0, 1], and let R be the ring of continuous functions 
X OR. 


(a) Let f,,..., fn be functions with no common zero on X. Prove that the ideal generated 
by these functions is the unit ideal. 
Hint: Consider te feet f2. 

(b) Establish a bijective correspondence between maximal ideals of R and points on the 
interval. 


CHAPTER 12 


Factoring 


You probably think that one knows everything about polynomials. 
—Serge Lang 


12.1 FACTORING INTEGERS 


We study division in rings in this chapter, modeling our investigation on properties of the 
ring of integers, and we begin by reviewing those properties. Some have been used without 
comment in earlier chapters of the book, and some have been proved before. 


A property from which many others follow is division with remainder: If a and b are 
integers and a is positive, there exist integers g and r so that 
(12.1.1) b=aq+r, and 0<r<a. 


We’ve seen some of its important consequences: 


Theorem 12.1.2 


(a) Every ideal of the ring Z of integers is principal. 
(b) A pair a, b of integers, not both zero, has a greatest common divisor, a positive integer 
d with these properties: 


(i) Zd = Za + Zb, 

(ii) d divides a and d divides b, 
(iii) if an integer e divides a and b,then e divides d. 
(iv) There are integers r and s such that d=ra+ sb. 


(c) If aprime integer p divides a product ab of integers, then p divides a or p divides b. 


(d) Fundamental Theorem of Arithmetic: Every positive integer a#1 can be written as 
a product a = p;--: px, where the p; are positive prime integers, and k > 0. This 
expression is unique except for the ordering of the prime factors. 


The proofs of these facts will be reviewed in a more general setting in the next section. 
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12.2 UNIQUE FACTORIZATION DOMAINS 


It is natural to ask which rings have properties analogous to those of the ring of integers, 
and we investigate this question here. There are relatively few rings for which all parts of 
Theorem 12.1.2 can be extended, but polynomial rings over fields are important cases in 
which they do extend. 


When discussing factoring, we assume that the ring R is an integral domain, so that the 
Cancellation Law 11.7.1 is available, and we exclude the element zero from consideration. 
Here is some terminology that we use: 


(12.2.1) uisaunit if u has a multiplicative inverse in R. 
a divides b if b= aq for some qin R. 
aisa proper divisor of b if b = aq and neither a nor q isa unit. 
aand bare associates if each divides the other, or if b = ua, and u is a unit. 


ais irreducible if ais not a unit, and it has no proper divisor — 
its only divisors are units and associates. 
pisaprime element if pis not aunit, and whenever p divides a product ab, 
then p divides a or p divides b. 


These concepts can be interpreted in terms of the principal ideals generated by the elements. 
Recall that the principal ideal (a) generated by an element a consists of all elements of R 
that are are divisible by a. Then 


(12.2.2) uisaunit <— (u)=(1). 
adividesb <= (b)C(a). 
aisa proper divisorofb <= (b) < (a) < (1). 
aand bare associates <= (a)=(b). 
aisirreducible << (a) < (1), and there is no principal ideal (c) 


such that (a) < (c) < (1). 
ab € (p) implies a € (p) orbe (p). 


q 


Pp is a prime element 


Before continuing, we note one of the simplest examples of a ring element that has 
more than one factorization. The ring is R = Z[./-5]. It consists of all complex numbers of 
the form a + bV-5, where a and b are integers. We will use this ring as an example several 
times in this chapter and the next. In R, the integer 6 can be factored in two ways: 


(12.2.3) 2-3=6= (14+ V-5)(1 — V-5). 
It isn’t hard to show that none of the four terms 2, 3, 1 + /-5, 1 — V-5 can be factored 
further; they are irreducible elements of the ring. 


We abstract the procedure of division with remainder first. To make sense of division 
with remainder, we need a measure of size of an element. A size function on an integral 
domain R can be any function o whose domain is the set of nonzero elements of R, and 
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whose range is the set of nonnegative integers. An integral domain Risa Euclidean domain 
if there is a size function o on R such that division with remainder is possible, in the following 
sense: 


Let a and D be elements of R, and suppose that a is not zero. 
(12.2.4) There are elements g and r in Rsuchthatb =aq+r, 
and either r=0 orelse o(r) < o(a). 


The most important fact about division with remainder is that r is zero, if and only if a 
divides b. 


Proposition 12.2.5 


(a) The ring Z of integers is a Euclidean domain, with size function o(@) = |a|. 


(b) A polynomial ring F[x] in one variable over a field F is a Euclidean domain, with 
o(f) = degree of f. 
(c) The ring Z[i] of Gauss integers is a Euclidean domain, with o(a) = |a|?. 


The ring of integers and the polynomial rings were discussed in Chapter 11. We show 
here that the ring of Gauss integers is a Euclidean domain. The elements of Z[i] form a 
square lattice in the complex plane, and the multiples of a given nonzero element a form 
the principal ideal (@), which is a similar geometric figure. If we write a = re®, then (a) is 
obtained from the lattice Z[i] by rotating through the angle 9 and stretching by the factor r, 
as is illustrated below with a = 2 +i: 


Bo ey ORE Oe oe. ce Se 
rT ee, Sa a a ee 
: . xk : * 
. xk : a 
* 
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x Oe . x Oe 
+ Ok * 
tes 38 . x 
(12.2.6) A Principal Ideal in the Ring of Gauss Integers. 


For any complex number 8, there is a point of the lattice (@) whose square distance from B 
is less than |a|*. We choose such a point, say y = aq, and let r = 8B — y. Then B = aq +r, 
and |r|? <|q@|?, as required. Here gq is in Z[i], and if B is in Z[i], so is r. 

Division with remainder is not unique: There may be as many as four choices for the 
element y. O 


¢ An integral domain in which every ideal is principal is called a principal ideal domain. 
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Proposition 12.2.7 A Euclidean domain is a principal ideal domain. 


Proof. We mimic the proof that the ring of integers is a principal ideal domain once more. 
Let R be a Euclidean domain with size function o, and let A be an ideal of R. We must 
show that A is principal. The zero ideal is principal, so we may assume that A is not the zero 
ideal. Then A contains a nonzero element. We choose a nonzero element a of A such that 
o(a) is as small as possible, and we show that A is the principal ideal (a) of multiples of a. 
Because A is an ideal and ais in A, any multiple ag withgin Risin A.So (a)C A.To 
show that A C (a), we take an arbitrary element b of A. We use division with remainder to 
write b = aq +r, where either r = 0, or o(r) < o(a). Then b and aq are in A, sor = b—agq 
isin A too. Since o(a) is minimal, we can’t have o(r) < o(a), and it follows that r = 0. This 
shows that a divides b, and hence that b is in the principal ideal (a). Since b is arbitrary, 
A C (a), and therefore A = (a). 0 


Let a and b be elements of an integral domain R, not both zero. A greatest common 
divisor d of a and b is an element with the following properties: 


(a) d divides a and b. 
(b) If an element e divides a and b, then e divides d. 


Any two greatest common divisors d and d’ are associate elements. The first condition tells 
us that both d and d’ divide a and b, and then the second one tells us that d’ divides d and 
also that d divides d’. 

However, a greatest common divisor may not exist. There will often be a common 
divisor m that is maximal, meaning that a/m and b/m have no proper divisor in common. But 
this element may fail to satisfy condition (b). For instance, in the ring Z[/-5] considered 
above (12.2.3), the elements a = 6 and b = 2 + 2¥-5 are divisible both by 2 and by 
1+ /-5. These are maximal elements among common divisors, but neither one divides 
the other. 

One case in which a greatest common divisor does exist is that a and b have no common 
factors except units. Then 1 is a greatest common divisor. When this is so, a and b are said 
to be relatively prime. 

Greatest common divisors always exist in a principal ideal domain: 


Proposition 12.2.8 Let R be a principal ideal domain, and let a and b be elements of R, 
which are not both zero. An element d that generates the ideal (a,b) = Ra+ Rbisa 
greatest common divisor of a and b. It has these properties: 

(a) Rd = Ra+ Rb, 

(b) d divides a and b. 

(c) If an element e of R divides both a and J, it also divides d. 

(d) There areelementsr and sin Rsuchthatd =ra+sb. 


Proof. This is essentially the same proof as for the ring of integers. (a) restates that d 
generates the ideal (a, b). (b) states that a and b are in Rd, and (d) states that d is in the 
ideal Ra + Rb. For (c), we note that if e divides a and b then a and b are elements of Re. 
In that case, Re contains Ra + Rb = Rd, soe divides d. O 
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Corollary 12.2.9 Let R be a principal ideal domain. 


(a) If elements a and bof R are relatively prime, then 1 is a linear combination ra + sb. 
(b) An element of R is irreducible if and only if it is a prime element. 
(c) The maximal ideals of R are the principal ideals generated by the irreducible elements. 


Proof. (a) This follows from Proposition 12.2.8¢d). 


(b) In any integral domain, a prime element is irreducible. We prove this below, in Lemma 
12.2.10. Suppose that R is a principal ideal domain and that an irreducible element g of R 
divides a product ab. We have to show that if g does not divide a, then q divides b. Let d be 
a greatest common divisor of a and q. Since q is irreducible, the divisors of g are the units 
and the associates of g. Since g does not divide a, d is not an associate of g. So d is a unit, g 
and a are relatively prime, and 1 = ra+sq withrand sin R. We multiply by b: b = rab+ sqb. 
Both terms on the right side of this equation are divisible by g, so q divides the left side, b. 


(c) Let g be an irreducible element. Its divisors are units and associates. Therefore the only 
principal ideals that contain (q) are (qg) itself and the unit ideal (1) (see (12.2.2)). Since 
every ideal of R is principal, these are the only ideals that contain (q). Therefore (q) is a 
maximal ideal. Conversely, if an element b has a proper divisor a, then (b) < (a) < (1) ,so 
(b) is not a maximal ideal. O 


Lemma 12.2.10 In an integral domain R, a prime element is irreducible. 


Proof. Suppose that a prime element p is a product, say p = ab. Then p divides one of the 
factors, say a. But the equation p = ab shows that a divides p too. So a and p are associates 
and b is a unit. The factorization is not proper. O 


What analogy to the Fundamental Theorem of Arithmetic 12.1.2(d) could one hope for 
in an integral domain? We may divide the desired statement of uniqueness of factorization 
into two parts. First, a given element should be a product of irreducible elements, and 
second, that product should be essentially unique. 

Units in a ring complicate the statement of uniqueness. Unit factors must be disregarded 
and associate factors must be considered equivalent. The units in the ring of integers are 
+1, and in this ring it is natural to work with positive integers. Similarly, in the polynomial 
ring F'[x] over a field, it is natural to work with monic polynomials. But we don’t have a 
reasonable way to normalize elements in an arbitrary integral domain; it is best not to try. 

We say that factoring in an integral domain R is unique if, whenever an element a of 
R is written in two ways as a product of irreducible elements, say 


then m = n, and if the right side is rearranged suitably, g; is an associate of p; for each i. So 
in the statement of uniqueness, associate factorizations are considered equivalent. 
For example, in the ring of Gauss integers, 


(24+0)Q-i) =5=(14+21)0 - 24). 


These two factorizations of the element 5 are equivalent because the terms that appear on 
the left and right sides are associates: -i(2 +7) = 1—2i andi(Q2 —i) =1+42i. 
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It is neater to work with principal ideals than with elements, because associates generate 
the same principal ideal. However, it isn’t too cumbersome to use elements and we will stay 
with them here. The importance of ideals will become clear in the next chapter. 

When we attempt to write an element a as a product of irreducible elements, we always 
assume that it is not zero and not a unit. Then we attempt to factor a, proceeding this way: If 
a is irreducible, we stop. If not, then a has a proper factor, so it decomposes in some way as 
a product, say a = a)b,, where neither a, nor b, is a unit. We continue factoring a; and by, 
if possible, and we hope that this procedure terminates; in other words, we hope that after a 
finite number of steps all the factors are irreducible. We say that factoring terminates in R if 
this is always true, and we refer to a factorization into irreducible elements as an irreducible 
factorization. 

An integral domain R is a unique factorization domain if it has these properties: 


(12.2.12) 
e Factoring terminates. 
e The irreducible factorization of an element a is unique in the sense described above. 


The condition that factoring terminates has a useful description in terms of principal 
ideals: 


Proposition 12.2.13 Let R be an integral domain. The following conditions are equivalent: 


e Factoring terminates. 


¢ R does not contain an infinite strictly increasing chain (a,) < (a2) < (a3) <--: of 
principal ideals. 


Proof. If the process of factoring doesn’t terminate, there will be an element a, with a 
proper factorization such that the process fails to terminate for at least one of the factors. 
Let’s say that the proper factorization is aj = azb2, and that the process fails to terminate 
for the factor we call a2. Since a2 is a proper divisor of aj, (a,) < (az) (see (12.2.2)). We 
replace a, by az and repeat. In this way we obtain an infinite chain. 

Conversely, if there is a strictly increasing chain (a,) < (a2) <---, then none of the 
ideals (ay) is the unit ideal, and therefore a2 is a proper divisor of a1, a3 is a proper divisor 
of a2, and so on (12.2.2). This gives us a nonterminating process. O 


We will rarely encounter rings in which factoring fails to terminate, and we will prove 
a theorem that explains the reason later (see (14.6.9)), so we won’t worry much about it 
here. In practice it is the uniqueness that gives trouble. Factoring into irreducible elements 
will usually be possible, but it will not be unique, even when one takes into account the 
ambiguity of associate factors. 


Going back to the ring R = Z[/-5], it isn’t hard to show that all of the elements 2, 3, 
1+ /-5 and 1 — J-5 are irreducible, and that the units of R are 1 and -1. So 2 is not an 
associate of 1 + V-5 or of 1 — V-5. Therefore 2-3 = 6 = (1+ V-5)(1 — V-5) are essentially 


different factorizations: R is not a unique factorization domain. 
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Proposition 12.2.14 


(a) Let R be an integral domain. Suppose that factoring terminates in R. Then R is a unique 
factorization domain if and only if every irreducible element is a prime element. 


(b) A principal ideal domain is a unique factorization domain. 


(c) The rings Z, Z[i] and the polynomial ring F[x] in one variable over a field F are unique 
factorization domains. 


Thus the phrases irreducible factorization and prime factorization are synonymous in 
unique factorization domains, but most rings contain irreducible elements that are not prime. 
In the ring Z[—5], the element 2 is irreducible. It isnot prime because, though it divides the 
product (1 + V-5)(1 — /-5), it does not divide either factor. 

The converse of (b) is not true. We will see in the next section that the ring Z[x] of 
integer polynomials is a unique factorization domain, though it isn’t a principal ideal domain. 


Proof of Proposition (12.2.14). First of all, (c) follows from (b) because the rings mentioned 
in (c) are Euclidean domains, and therefore principal ideal domains. 


(a) Let R be a ring in which every irreducible element is prime, and suppose that an element 
a factors in two ways into irreducible elements, say Pp} --- Pm = @ = 1°:'Qn, wherem <n, 
If n = 1, then m = 1 and p; = q. Suppose that n > 1. Since pj is prime, it divides one of 
the factors qi, ..., Yn, SAY G1. Since q is irreducible and since p; is not a unit, gq; and p, are 
associates, say )1 = uq1, where u is a unit. We move the unit factor over to q2, replacing 
qi by uq, and q by u~'q. The result is that now p; = qi. Then we cancel p; and use 
induction on n. 

Conversely, suppose that there is an irreducible element p that is not prime. Then 
there are elements a and b such that p divides the product r = ab, say r = pc, but p 
does not divide a or b. By factoring a, b, and c into irreducible elements, we obtain two 
inequivalent factorizations of r. 


(b) Let R be a principal ideal domain. Since every irreducible element of R is prime (12.2.8), 
we need only prove that factoring terminates (12.2.14). We do this by showing that R 
contains no infinite strictly increasing chain of principal ideals. We suppose given an infinite 
weakly increasing chain 

(a1) C (a2) C (a3) C..., 


and we prove that it cannot be strictly increasing. 


Lemma 12.2.15 Let /;C 2C /3C... be an increasing chain of ideals in a ring R. The union 
J =U /n is an ideal. 


Proof. If u and v are in J, they are both in J, for some n. Then u + v and ru, for any rin 
R, are also in /,,, and therefore they are in J. This shows that J is an ideal. O 


We apply this lemma to our chain of principal ideals, with 7, = (a,), and we use the 
hypothesis that R is a principal ideal domain to conclude that the union J is a principal 
ideal, say J = (b). Then since b is in the union of the ideals (@,,), it is in one of those ideals. 
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But if b is in (a), then (b) C (a,). On the other hand, (a,) C (@n41) C (B). Therefore 
(b) = (an) = (@n41). The chain is not strictly increasing. Oo 


One can decide whether an element a divides another element b in a unique factorization 
domain, in terms of their irreducible factorizations. 


Proposition 12.2.16 Let R be a unique factorization domain. 


(a) Let a = pj --- Pm and b = q,---@n be irreducible factorizations of two elements of 
R. Then a divides b in R if and only if m < n and, when the factors g; are arranged 
suitably, p; is an associate of g; fori=1,...,m. 

(b) Any pair of elements a, b, not both zero, has a greatest common divisor. 


Proof. (a) This is very similar to the proof of Proposition 12.2.14(a). The irreducible factors 
of a are prime elements. If a divides b, then p; divides b, and therefore p; divides some g;, 
say gi. Then p; and q; are associates. The assertion follows by induction when we cancel p; 
from a and gq; from b. We omit the proof of (b). 0 


Note: Any two greatest common divisors of a and b are associates. But unless a unique 
factorization domain is a principal ideal domain, the greatest common divisor, though it 
exists, needn't have the form ra + sb. The greatest common divisor of 2 and x in the unique 
factorization domain Z[x] is 1, but we cannot write 1 as a linear combination of those 
elements with integer polynomials as coefficients. O 


We review the results we have obtained for the important case of a polynomial ring 
F[x] over a field. The units in the polynomial ring F[x] are the nonzero constants. We can 
factor the leading coefficient out of a nonzero polynomial to make it monic, and the only 
monic associate of a monic polynomial f is f itself. By working with monic polynomials, 
the ambiguity of associate factorizations can be avoided. With this taken into account, the 
next theorem follows from Proposition 12.2.14. 


Theorem 12.2.17_ Let F[x] be the polynomial ring in one variable over a field F. 


(a) Two polynomials f and g, not both zero, have a unique monic greatest common divisor 
d, and there are polynomials r and s such that rf + sg = d. 

(b) If two polynomials f and g have no nonconstant factor in common, then there are 
polynomials r and s such that rf + sg = 1. 

(c) Every irreducible polynomial p in F[x] is a prime element of F[x]: If p divides a 
product fg, then p divides f or p divides g. 

(d) Unique factorization: Every monic polynomial in F[x] can be written as a product 
P1:+* Px, where pj are monic irreducible polynomials in F[x] and k > 0. This factor- 
ization is unique except for the ordering of the terms. 0 


In the future, when we speak of the greatest common divisor of two polynomials with 
coefficients in a field, we will mean the unique monic polynomial with the properties (a) 
above. This greatest common divisor will sometimes be denoted by ged(f, g). 

The greatest common divisor gcd(f, g) of two polynomials f and g, not both zero, 
with coefficients in a field F can be found by repeated division with remainder, the process 
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called the Euclidean algorithm that we mentioned in Section 2.3 for the ring of integers: 
Suppose that the degree of g is at least equal to the degree of f. We write g = fq+r where 
the remainder r, if it is not zero, has degree less than that of f. Then ged(f, g) = gcd( fr). 
If r = 0, gced(f, g) = f. If not, we replace f and g by r and f/f, and repeat the process. 
Since degrees are being lowered, the process is finite. The analogous method can be used to 
determine greatest common divisors in any Euclidean domain. 


Over the complex numbers, every polynomial of positive degree has a root a, and 
therefore a divisor of the form x — a. The irreducible polynomials are linear, and the 
irreducible factorization of a monic polynomial has the form 


(12.2.18) (x) = (x — of) --- (Xan), 


where a; are the roots of f(x), with repetitions for multiple roots. The uniqueness of this 
factorization is not surprising. 

When F = R, there are two classes of irreducible polynomials: linear and quadratic. A 
real quadratic polynomial x” + bx + c is irreducible if and only if its discriminant b? — 4c 
is negative, in which case it has a pair of complex conjugate roots. The fact that every 
irreducible polynomial over the complex numbers is linear implies that no real polynomial 
of degree >2 is irreducible. 


Proposition 12.2.19 Let a be a complex, not real, root of a real polynomial f. Then the 
complex conjugate @ is also a root of £. The quadratic polynomial g = (x — a@)(x — @) has 
real coefficients, and it divides f. O 


Factoring polynomials in the ring Q[x] of polynomials with rational coefficients is more 
interesting, because there exist irreducible polynomials in Q[x] of arbitrary degree. This is 
explained in the next two sections. Neither the form of the irreducible factorization nor its 
uniqueness are intuitively clear in this case. 

For future reference, we note the following elementary fact: 


Proposition 12.2.20 A polynomial f of degree n with coefficients in a field F has at most n 
roots in F. 


Proof. An element @ is a root of f if and only if x —a@ divides f (11.2.11). If so, we can 
write f(x) = (x — a@)q(x), where q(x) is a polynomial of degree n — 1. Let B be a root of 
f different from @. Substituting x = 8, we obtain 0 = (B — a)q(B). Since f is not equal 
to @, it must be a root of g. By induction on the degree, g has at most n — 1 roots in F. 
Putting those roots together with a, we see that f has at most n roots. O 


12.3. GAUSS’S LEMMA 


Every monic polynomial f(x) with rational coefficients can be expressed uniquely in the 
form p1--+ px, where p; are monic polynomials that are irreducible elements in the ring 

Q[x]. But suppose that a polynomial f(x) has integer coefficients, and that it factors in Q[x]. 
Can it be factored without leaving the ring Z[x] of integer polynomials? We will see that it 
can, and also that Z[x] is a unique factorization domain. 
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Here is an example of an irreducible factorization in integer polynomials: 
6x? + 9x? 49x43 = 32x +1)? +x41). 


As we see, irreducible factorizations are slightly more complicated in Z[x] than in Q[x]. 
Prime integers are irreducible elements of Z[x], and they may appear in the factorization of a 
polynomial. And, if we want to stay with integer coefficients, we can’t require monic factors. 


We have two main tools for studying factoring in Z[x]. The first is the inclusion of the 
integer polynomial ring into the ring of polynomials with rational coefficients: 


Z[x] C Q[x]. 


This can be useful because algebra in the ring Q[x] is simpler. 
The second tool is reduction modulo some integer prime p, the homomorphism 


(12.3.1) Wp: Z[x] > Fp[x] 


that sends x ~» x (11.3.6). We'll often denote the image yw p(/f) of an integer polynomial by 
f, though this notation is ambiguous because it doesn’t mention p. 
The next lemma should be clear. 


Lemma 12.3.2, Let f(x) = anx" +-+++a,x + ao be an integer polynomial, and let p be an 
integer prime. The following are equivalent: 

e p divides every coefficient a; of f in Z, 

e p divides f in Z[x], 

e f isin the kernel of yp. O 


The lemma shows that the kernel of yp can be interpreted easily without mentioning 
the map. But the facts that y, is a homomorphism and that its image Fp,[x] is an integral 
domain make the interpretation as a kernel useful. 
¢ Apolynomial f(x) = anx”" +---+a@,x+dp with rational coefficients is called primitive if it 
is an integer polynomial of positive degree, the greatest commmon divisor of its coefficients 
ao, ..., 4» in the integers is 1, and its leading coefficient ay is positive. 


Lemma 12.3.3 Let f be an integer polynomial f of positive degree, with positive leading 
coefficient. The following conditions are equivalent: 

« f is primitive, 

e f is not divisible by any integer prime p, 

e for every integer prime p, Wp(f) £0. Oo 


Proposition 12.3.4 

(a) An integer is a prime element of Z[x] if and only if it is a prime integer. So a prime 
integer p divides a product fg of integer polynomials if and only if p divides f or p 
divides g. ; 

(b) (Gauss’s Lemma) The product of primitive polynomials is primitive. 
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Proof. (a) It is obvious that an integer must be a prime if it is an irreducible element of Z[x]. 
Let p be a prime integer. We use bar notation: f = Wp(f). Then p divides fg if and only if 
fg = 0, and since Fp[x] is a domain, this is true if and only if f = 0 or 8 = 0, i.e., if and only 
if p divides f or p divides g. 


(b) Suppose that f and g are primitive polynomials. Since their leading coefficients are 
positive, the leading coefficient of fg is also positive. Moreover, no prime p divides f or g, 
and by (a), no prime divides fg. So fg is primitive. O 


Lemma 12.3.5 Every polynomial f(x) of positive degree with rational coefficients can be 
written uniquely as a product f(x) = c fy(x), where c is a rational number and fo(x) is a 
primitive polynomial. Moreover, c is an integer if and only if f is an integer polynomial. If 
f is an integer polynomial, then the greatest common divisor of the coefficients of f is +c. 


Proof. To find fy, we first multiply f by an integer d to clear the denominators in its 
coefficients. This will give us a polynomial df = f; with integer coefficients. Then we factor 
out the greatest common divisor of the coefficients of f, and adjust the sign of the leading 
coefficient. The resulting polynomial fy is primitive, and f = c fp for some rational number 
c. This proves existence. 

If f is an integer polynomial, we don’t need to clear the denominator. Then c will be 
an integer, and up to sign, it is the greatest common divisor of the coefficients, as stated. 

The uniqueness of this product is important, so we check it carefully. Suppose given 
rational numbers c and c’ and primitive polynomials fo and fj such that c fo = c’ fo. We 
will show that fo = fj. Since Q[x] is a domain, it will follow that c = c’. 

We multiply the equation c fo = c’ fj by an integer and adjust the sign if necessary, to 
reduce to the case that c and c’ are positive integers. If c#1, we choose a prime integer p 
that divides c. Then p divides c’ fj. Proposition 12.3.4(a) shows that p divides one of the 
factors c’ or f. Since f is primitive, it isn’t divisible by p, so p divides c’. We cancel p 
from both sides of the equation. Induction reduces us to the case that c = 1, and the same 
reasoning shows that then c’ = 1.S0 fo = fp. O 


Theorem 12.3.6 

(a) Let fo be a primitive polynomial, and let g be an integer polynomial. If fo divides g in 
Q[x], then fo divides g in Z[x]. 

(b) If two integer polynomials f and g have a common nonconstant factor in Q[x], they 
have a common nonconstant factor in Z[x]. 


Proof. (a) Say that g = foq where q has rational coefficients. We show that q has integer 
coefficients. We write g = cgo, and g = c’qg, with go and qo primitive. Then cgo = c’ fogo. 
Gauss’s Lemma tells us that fogo is primitive. Therefore by the uniqueness assertion of 
Lemma 12.3.5, c = c’ and gq = foqo. Since g is an integer polynomial, c is an integer. So 
q = Cqp is an integer polynomial. 


(b) If the integer polynomials f and g have a common factor h in Q[x] and if we write 
h =cho, where hg is primitive, then ho also divides f and g in Q[x], and by (a), ho divides 
both f and g in Z[x]. Oo 
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Proposition 12.3.7 


(a) Let f be an integer polynomial with positive leading coefficient. Then / is an irreducible 
element of Z[x] if and only if it is either a prime integer or a primitive polynomial that 
is irreducible in Q[x]. 

(b) Every irreducible element of Z[x] is a prime element. 


Proof. Proposition 12.3.4(a) proves (a) and (b) for a constant polynomial. If f is irreducible 
and not constant, it cannot have an integer factor different from +1, so if its leading coefficient 
is positive, it will be primitive. Suppose that f is a primitive polynomial and that it has a 
proper factorization in Q[x], say f = gh. We write g = cgo and h = c’ho, with go and ho 
primitive. Then gohpo is primitive. Since f is also primitive, f = goo. Therefore f has a 
proper factorization in Z| x] too. So if f is reducible in Q[x], it is reducible in Z[x]. The fact 
that a primitive polynomial that is reducible in Z[x] is also reducible in Q[x] is clear. This 
proves (a). 

Let f be a primitive irreducible polynomial that divides a product gh of integer 
polynomials. Then / is irreducible in Q[x]. Since Q[x] is a principal ideal domain, f is a 
prime element of Q[x] (12.2.8). So f divides g or h in Q[x]. By (12.3.6) f divides g or h in 
Z|x]. This shows that f is a prime element, which proves (b). O 


Theorem 12.3.8 The polynomial ring Z[x] is a unique factorization domain. Every nonzero 
polynomial f(x) € Z[x] that is not +1 can be written as a product 


f(x) =+pr--: Pmg (x) ++: gn(x), 


where pj are integer primes and q ;(x) are primitive irreducible polynomials. This expression 
is unique except for the order of the factors. 


Proof. It is easy to see that factoring terminates in Z[x], so this theorem follows from 
Propositions 12.3.7 and 12.2.14. fe 


The results of this section have analogues for the polynomial ring F[t, x] in two 
variables over a field F. To set up the analogy, we regard F[t, x] as the ring F[t][x] of 
polynomials in x whose coefficients are polynomials in t. The analogue of the field Q will be 
the field F(t) of rational functions in ¢, the field of fractions of F'[t]. We'll denote this field 
by ¥. Then F[t, x] is a subring of the ring F[x] of polynomials 


SF =n (x" +---+a,(Dx + a(t) 


whose coefficients a;(t) are rational functions in ¢t. This can be useful because every ideal of 
F [x] is principal. 

The polynomial f is called primitive if it has positive degree, its coefficients a;(t) are 
polynomials in F[t] whose greatest common divisor is equal to 1, and the leading coefficient 
a, (t) is monic. A primitive polynomial will be an element of the polynomial ring F[r, x]. 

It is true again that the product of primitive polynomials is primitive, and that every 
element f(t, x) of F[x] can be written in the form c(t) fo(t, x), where fo is a primitive 
polynomial in F[t, x] and c is a rational function in t, both uniquely determined up to 
constant factor. 
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The proofs of the next assertions are almost identical to the proofs of Proposition 12.3.4 
and Theorems 12.3.6 and 12.3.8. 


Theorem 12.3.9 Let F[t] be a polynomial ring in one variable over a field F, and let 

F = F(t) be its field of fractions. 

(a) The product of primitive polynomials in F[t, x] is primitive. 

(b) Let fo be a primitive polynomial, and let g be a polynomial in F[t, x]. If fo divides g in 

Fx], then fo divides g in F[t, x]. 

If two polynomials f and g in F[t, x] have acommon nonconstant factor in F [x], they 

have acommon nonconstant factor in F[f, x]. 

(d) Let f be an element of F[f, x] whose leading coefficient is monic. Then f is an 
irreducible element of F[t, x] if and only if it is either an irreducible polynomial in ¢ 
alone, or a primitive polynomial that is irreducible in ¥ [x]. 


(c 


—_ 


(e) The ring F[t, x] is a unique factorization domain. oO 


The results about factoring in Z[ x] also have analogues for polynomials with coefficients 
in any unique factorization domain R. 


Theorem 12.3.10 If R is a unique factorization domain, the polynomial ring R[x1,..., Xn] 
in any number of variables is a unique factorization domain. 


Note: In contrast to the case of one variable, where every complex polynomial is a product of 
linear polynomials, complex polynomials in two variables are often irreducible, and therefore 
prime elements, of C[t, x]. O 


12.4 FACTORING INTEGER POLYNOMIALS 


We pose the problem of factoring an integer polynomial 
(12.4.1) F(X) = anx” +--+ +a,x+a0, 
with a, #0. Linear factors can be found fairly easily. 


Lemma 12.4.2 

(a) If an integer polynomial b,x + bo divides f in Z[x], then b; divides a, and bo 
divides apo. 

(b) A primitive polynomial b,x + bo divides f in Z[x] if and only if the rational number 
-bo/ b; is aroot of f. 

(c) A rational root of a monic integer polynomial f is an integer. 


Proof. (a) The constant coefficient of a product (b,x + bo)(qn-1x"~! + --- +40) is bogo, 
and if g,-1 #0, the leading coefficient is byg,~1. 


(b) According to Theorem 12.3.10(c), b;x + bo divides f in Z[x] if and only ifit divides f in 
Q[x], and this is true if and only if x + bo/b, divides f, i.e.,-bo/b is a root. 
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(c) lfw = a/b is a root, written with b > 0, andif gcd(a, b) = 1, then bx — a is a primitive 
polynomial that divides the monic polynomial f, so b = 1 and @ is an integer. Oo 


The homomorphism wp : Z[x] > F [x] (12.3.1) is useful for explicit factoring, one 
reason being that there are only finitely many polynomials in F [x] of each degree. 


Proposition 12.4.3 Let f(x) = @nx” +--+ + ao be an integer polynomial, and let p be a 
prime integer that does not divide the leading coefficient ay. If the residue f of f modulo p 
is an irreducible element of F p[x], then f is an irreducible element of Q[x]. 


Proof, We prove the contrapositive, that if f is reducible, then f is reducible. Suppose that 
f = gh is a proper factorization of f in Q[x]. We may assume that g and A are in Z[x] 
(12.3.6). Since the factorization in Q[x] is proper, both g and A have positive degree, and, if 
deg f denotes the degree of f, then deg f = deg g + degh. 7 

Since Wp is a homomorphism, f = gh, so deg f = degg + degh. For any integer 
polynomial p, deg p < deg p. Our assumption on the leading coefficient of f tells us that 
deg f = deg f. This being so we must have degg.= deg g and degh = degh. Therefore 
the factorization f = gh is proper. O 


If p divides the leading coefficient of f, then f has lower degree, and using reduction 
modulo p becomes harder. 


If we suspect that an integer polynomial is irreducible, we can try reduction modulo p 
for asmall prime, p = 2 or 3 for instance, and hope that f turns out to be irreducible and of 
the same degree as f. If so, f will be irreducible too. Unfortunately, there exist irreducible 
integer polynomials that can be factored modulo every prime p. The polynomial x4—10x?+1 
is an example. So the method of reduction modulo p may not work. But it does work 
quite often. 


The irreducible polynomials in F,[x] can be found by the “sieve”? method. The sieve 
of Eratosthenes is the name given to the following method of determining the prime integers 
less than a given number n. We list the integers from 2 ton. The first one, 2, is prime because 
any proper factor of 2 must be smaller than 2, and there is no smaller integer on our list. We 
note that 2 is prime, and we cross out the multiples of 2 from our list. Except for 2 itself, 
they are not prime. The first integer that is left, 3, is a prime because it isn’t divisible by any 
smaller prime. We note that 3 is a prime and then cross out the multiples of 3 from our list. 
Again, the smallest remaining integer, 5, is a prime, and so on. 


23H 5K7°R H 11 KW BHM BH 17 HK 19 ... 


The same method will determine the irreducible polynomials in F,[x]. We list the 
monic polynomials, degree by degree, and cross out products. For example, the linear 
polynomials in F2[x] are x and x + 1. They are irreducible. The polynomials of degree 2 are 
x2, eae x, e+ 1, and x2 +x+1.The first three have roots in F9, so they are divisible by x 
or by x + 1. The last one, x* + x +1, is the only irreducible polynomial of degree 2 in F2[x]. 
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(12.4.4) The irreducible polynomials of degree < 4 in F [x]: 
x, x41; x2+x41; 234x741, B4x41; 
xpto8 ti, xttxt 1, xe 4x2tx ti, 


By trying the polynomials on this list, we can factor polynomials of degree at most 9 in 
F,[x]. For example, let’s factor f(x) = x° + x° + 1 in F)[x]. If it factors, there must be an 
irreducible factor of degree at most 2. Neither 0 nor 1 is a root, so f has no linear factor. 
There is only one irreducible polynomial of degree 2, namely p = x? +x + 1. Wecarry out 
division with remainder: f(x) = p(x) (x3 + x? + x) + (x +1). So p doesn’t divide f, and 
therefore f is irreducible. 

Consequently, the integer polynomial x° — 64x4 + 127x? — 200x + 99 is irreducible in 
Q[x], because its residue in F2[x] is the irreducible polynomial x° + x? + 1. 


(12.4.5) The monic irreducible polynomials of degree 2 in F3[]: 


x41, x*4x-1, x?-x-1. 


Reduction modulo p may help describe the factorization of a polynomial also when the 
residue is reducible. Consider the polynomial f(x) = x3 + 3x” + 9x + 6. Reducing modulo 
3, we obtain x°. This doesn’t look like a promising tool. However, suppose that f(x) were 
reducible in Z[x], say f(x) = (x+a) (x? + bx +c). Then the residue of x +a would divide x? 
in F3[x], which would imply a=0 modulo 3. Similarly, we could conclude c=0 modulo 3. It 
is impossible to satisfy both of these conditions because the constant term ac of the product 
is supposed to be equal to 6. Therefore no such factorization exists, and f(x) is irreducible. 

The principle at work in this example is called the Eisenstein Criterion. 


Proposition 12.4.6 Eisenstein Criterion. Let f(x) = a,x" +---+apo be an integer polynomial 
and let p be a prime integer. Suppose that the coefficients of f satisfy the following conditions: 


e p does not divide apy; 
e p divides all other coefficients an_1, ..., Ao; 
« p* does not divide ao. 


Then / is an irreducible element of Q[x]. 


For example, the polynomial x* + 25x? + 30x + 20is irreducible in Q[x]. 


Proof of the Eisenstein Criterion. Assume that f satisfies the conditions, and let f denote 
the residue of f modulo p. The hypotheses imply that f = @,x” and that a, 40. If f is 
reducible in Q[x], it will factor in Z[x] into factors of positive degree, say f = gh, where 
B(x) = bx" +--+ + bo and h(x) = csx* +--+ +o. Then g divides @,x", so g has the form 
b,x". Every coefficient of g except the leading coefficient is divisible by p. The same is true 
of h. The constant coefficient ag of f will be equal to boco, and since p divides bo and co, 
p” must divide ap. This contradicts the third condition. Therefore / is irreducible. Oo 
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One application of the Eisenstein Criterion is to prove the irreducibility of the 
cyclotomic polynomial ®(x) = xP-! + xP? + --»-+x +1, where p is a prime. Its roots are 
the pth roots of unity, the powers of ¢ = e?”'/P different from 1: 


(12.4.7) (x- 1) O@) =x? -1. 


Lemma 12.4.8 Let p be a prime integer. The binomial coefficient (?) is an integer divisible 
exactly once by p for every r in the range 1 <r < p. 


Proof. The binomial coefficient (?) is 


P\ _ P(p-))-:-(p-rt+l) 
= r(r—1)+--1 , 


When r < p, the terms in the denominator are all less than p, so they cannot cancel the 
single p that is in the numerator. Therefore (?) is divisible exactly once by p. Oo 


Theorem 12.4.9 Let p be a prime. The cyclotomic polynomial ®(x) = xP7! 4 xP72 4..-4 
x + 1is irreducible over Q. 


Proof. We substitute x = y + | into (12.4.7) and expand the result: 


y@(y+ I) = (41)? -1l=yPt+ (hye +e + (,2a)r# —1. 
We cancel y. The lemma shows that the Eisenstein Criterion applies, and that ®(y + 1) is 
irreducible. It follows that ®(x) is irreducible too. O 


Estimating the Coefficients 


Computer programs factor integer polynomials by factoring modulo powers of a prime, 
usually the prime p = 2. There are fast algorithms, the Berlekamp algorithms, to do this. 
The simplest case is that f is a monic integer polynomial whose residue modulo p is the 
product of relatively prime monic polynomials, say f = gh in F,p[x]. Then there will be a 
unique way to factor f modulo any power of p. (We won’t take the time to prove this.) 
Let’s suppose that this is so, and that we (or the computer) have factored modulo the powers 
Pp, p’, p’,... lf f factors in Z[x], the coefficients of the factors modulo p* will stabilize 
when they are represented by integers between - p*/2 and p* /2, and this will produce the 
integer factorization. If f is irreducible in Z[x], the coefficients of the factors won’t stabilize. 
When they get too big, one can conclude that the polynomial is irreducible. 

The next theorem of Cauchy can be used to estimate how big the coefficients of the 
integer factors could be. 


Theorem 12.4.10 Let f(x) =x” + an_,x""! + ---+.,x + ag be amonic polynomial with 
complex coefficients, and let r be the maximum of the absolute values |a;| of its coefficients. 
The roots of f have absolute value less than r + 1. 
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Proof of Theorem 12.4.10. The trick is to rewrite the expression for f in the form 
x" = f- (@n1x"" feeb a}x + ag) 


and to use the triangle inequality: 


(12.4.11) [xl < | f(x)| + lan—rllxl"? + ++» + lay |lx] + lao! 

= x|? —1 

<1 fC) + r(x + d+) = FG] +r 
Let a be a complex number with absolute value |a| > r+1.Then | < 1. We substitute 
(o4 _— 
x = a into (12.4.11): 
n |oe|” ae n 
la|" <|fl(@)|+r S| f(a@)| + la” — 1. 

ja|—1 

Therefore | f(a)| => 1, and @ is not a root of f. 7 


We give two examples in which r = 1. 


Examples 12.4.12 (a) Let f(x) = x° + x4 +23 +x? +1. The irreducible factorization 
modulo 2 is 


go at ee oh Gaal ay oes te ue oe ae Oe cae ee 1). 
Since the factors are distinct, there is just one way to factor f modulo 22, and it is 
ot x44 84x24] =O? -x4¢Dx44% oa ar oe eo modulo 4. 
The factorizations modulo 2? and modulo 24 are the same. If we had made these computa- 


tions, we would guess that this is an integer factorization, which it is. 


(b) Let f(x) = x® — x44 x3 + x? + 1. This polynomial factors in the same way modulo 2. If 
f were reducible in Z[x], it would have a quadratic factor x? + ax + b, and b would be the 
product of two roots of f. Cauchy’s theorem tells us that the roots have absolute value less 
than 2, so |b| < 4. Computing modulo phe 


go a hg eg = (x? +x —5)(x* — x9 4 5x? +:7x 4-3), modulo 16. 


The constant coefficient of the quadratic factor is -5. This is too big, so f is irreducible. 


Note: It isn’t necessary to use Cauchy’s Theorem here. Since the constant coefficient of f is 
1, the fact that -5#+1 modulo 16 also proves that f is irreducible. O 


The computer implementations for factoring are interesting, but they are painful to 
carry out by hand. It is unpleasant to determine a factorization modulo 16 such as the one 
above by hand, though it can be done by linear algebra. We won’t discuss computer methods 
further. If you want to pursue this topic, see [LL&L]. 
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12.5 GAUSS PRIMES 


We have seen that the ring Z[i] of Gauss integers is a Euclidean domain. Every element that 
is not zero and not a unit is a product of prime elements. In this section we describe these 
prime elements, called Gauss primes, and their relation to integer primes. 

In Z[i], 5 = (2 +i)(2 — 1), and the factors 2 + i and 2 —i are Gauss primes. On the 
other hand, the integer 3 doesn’t have a proper factor in Z[i]. It is itself a Gauss prime. These 
examples exhibit the two ways that prime integers can factor in the ring of Gauss integers. 

The next lemma follows directly from the definition of a Gauss integer: 


Lemma 12.5.1 
e A Gauss integer that is a real number is an integer. 


e An integer d divides a Gauss integer a + bi in the ring Zi] if and only if d divides both a 
and b in Z. O 


Theorem 12.5.2 


(a) Let zr be a Gauss prime, and let 7¢ be its complex conjugate. Then 7r7r is either an integer 
prime or the square of an integer prime. 

(b) Let p be aninteger prime. Then p is either a Gauss prime or the product 777 of a Gauss 
prime and its complex conjugate. 

(c) The integer primes p that are Gauss primes are those congruent to 3 modulo 4: 
p=3,7,11,19,... 

(d) Let p be an integer prime. The following are equivalent: 


(i) pis the product of complex conjugate Gauss primes. 

(ii) p is congruent 1 modulo 4, or p =2: p =2,5,13,17,... 
(iii) pis the sum of two integer squares: p = a* + b?. 
(iv) The residue of ~1 is a square modulo p. 


Proof of Theorem 12.5.2 (a) Let 2 be a Gauss prime, say 7 = a+ bi. We factor the positive 
integer 771 = a* + b’ in the ring of integers: 7777 = p,--+ px. This equation is also true in the 
Gauss integers, though it is not necessarily a prime factorization in that ring. We continue 
factoring each p; if possible, to arrive at a prime factorization in Z[i]. Because the Gauss 
integers have unique factorization, the prime factors we objain must be associates of the two 
factors 7 and 7. Therefore k is at most two. Either zz is an integer prime, or else it is the 
product of two integer primes. Suppose that 777 = pj; po, and say that zr is an associate of 
the integer prime pj, i.e., that 7 = +p, or +i p;. Then 7 is also an associate of p}, so is 77, so 
Pi = p2,and w7 = Di 

(b) If p is an integer prime, it is not a unit in Z[i]. (The units are +1, +7.) So p is divisible by 
a Gauss prime zr. Then 7 divides , and p = p. So the integer 77 divides p* in Z[i] and 
also in Z. Therefore 7é7 is equal to p or p’. If #7 = p*, then m and p are associates, so p is 
a Gauss prime. 


Part (c) of the theorem follows from (b) and (d), so we need not consider it further, and we 
turn to the proof of (d). It is easy to see that (d)(i) and (d)(iii) are equivalent: If p = 771 
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for some Gauss prime, say 7 = a + bi, then p = a? + b’ is a sum of two integer squares. 
Conversely, if p = a* + b’, then p factors in the Gauss integers: p = (a — bi)(a + bi), and 
(a) shows that the two factors are Gauss primes. O 


Lemma 12.5.3 below shows that (d)(i) and (d)(iv) are equivalent, because (12.5.3)(a) 
is the negation of (d)(i) and (12.5.3)(c) is the negation of (d)(iv). 


Lemma 12.5.3 Let p be an integer prime. The following statements are equivalent: 


(a) pis a Gauss prime; 
(b) the quotient ring R = Z[i]/(p) isa field; 
(c) x* +1is an irreducible element of F p[x] (12.2.8)(c). 


Proof. The equivalence of the first two statements follows from the fact that Z[i]/(p) is a 
field if and only if the principal ideal (p) of Z[i] is a maximal ideal, and this is true if and 
only if p is a Gauss prime (see (12.2.9)). 

What we are really after is the equivalence of (a) and (c), and at a first glance these 
statements don’t seem to be related at all. It is in order to obtain this equivalence that we 
introduce the auxiliary ring R = Z[i]/(p). This ring can be obtained from the polynomial 
ring Z[x] in two steps: first killing the polynomial x“ + 1, which yields a ring isomorphic to 
Zi], and then killing the prime p in that ring. We may just as well introduce these relations 
in the opposite order. Killing the prime p first gives us the polynomial ring F p[x], and then 


killing x* + 1 yields R again, as is summed up in the diagram below. 
kill 
(12.5.4) Z[x] > Fplx] 


kill kill 
x44 x2 41 


ai x 
ae 


We now have two ways to decide whether or not R is a field. First, R will be a field if 
and only if the ideal (p) in the ring Z[i] is a maximal ideal, which will be true if and only if p 
is a Gauss prime. Second, R will be a field if and only if the ideal (x + 1) in the ring F pix] 
is a maximal ideal, which will be true if and only if x? + 1 is an irreducible element of that 
ring (12.2.9). This shows that (a) and (c) of Theorem 12.5.2 are equivalent. O 


To complete the proof of equivalence of (i)—(iv) of Theorem 12.5.2(d), it suffices to 
show that (ii) and (iv) are equivalent. It is true that -1 is a square modulo 2. We look at the 
primes different from 2. The next lemma does the job: 


Lemma 12.5.5 Let p be an odd prime. 
(a) The multiplicative group Fy contains an element of order 4 if and only if p = 1 
modulo 4. 


(b) The integer a solves the congruence x*=-1 modulo p if and only if its residue @ is an 
element of order 4 in the multiplicative group F*,. 
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Proof. (a) This follows from a fact mentioned before, that the multiplicative group ie isa 
cyclic group (see (15.7.3)). We give an ad hoc proof here. The order of an element divides the 
order of the group. So if @ has order 4 in F*, then the order of F<, which is p — 1, is divisible 
by 4. Conversely, suppose that p — 1 is divisible by 4. We consider the homomorphism 
Q: FS > ¥, that sends x ~» x”. The only elements of F, whose squares are 1 are +1 (see 


(12.2.20)). So the kernel of g is {+1}. Therefore its image, call it-H, has even order (p — 1)/2. 
The first Sylow Theorem shows that H contains an element of order 2. That element is the 
square of an element x of order 4. 


(b) The residue @ has order 4 if and only if a” has order 2. There is just one element in Fp, of 
order 2, namely the residue of -1. So d@ has order 4 if and only if a” = -1. Oo 


This competes the proof of Theorem 12.5.2. O 


You want to hit home run without going into spring training? 


—Kenkichi lwasawa 


EXERCISES 


Section 1 Factoring Integers 
1.1, Prove that a positive integer n that is not an integer square is not the square ofa rational 
number. 
1.2. (partial fractions) 


(a) Write the fraction 7/24 in the form a/8 + b/3. 


(b) Prove that if m = uv, where u and v are relatively prime, then every fraction 
’ q = m/ncan be written in the form g = a/u + b/v. 


1.3. (Chinese Remainder Theorem) 


(a) Let n and m be relatively prime integers, and let a and b be arbitrary integers. Prove 
that there is an integer x that solves the simultaneous congruence x=a modulo m 
and x=b modulo n. 


(b) Determine all solutions of these two congruences. 


1.4. Solve the following simultaneous congruences: 


(a) x=3 modulo 8, x=2 modulo 5, 
(b) x=3 modulo 15, x=5 modulo 8, x =2 modulo 7, 
(c) x=13 modulo 43, x=7 modulo 71. 


1.5. Let a and b be relatively prime integers E Prove that there are integers m and n such that 
a™ + b"=1 modulo ab. 
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Section 2 Unique Factorization Domains 


2.1. Factor the following polynomials into irreducible factors in F p[x]. 


2.2. 


2.3, 
2.4. 


2.5. 


2.6. 


2.7. 


2.8 


2.9, 


2.10. 


(a) O4+2x7 4x41, p=2, (b) x*-3x-3, p=5, () P41, p=7 

Compute the greatest common divisor of the polynomials x® + x4 + x3 + x2 +.x +1 and 
x4 2x3 4x7 4x41 in Q[x]. 

How many roots does the polynomial x? — 2 have, modulo 8? 


Euclid proved that there are infinitely many prime integers in the following way: If 
Pi,---, Px are primes, then any prime factor p of (p;--- pg) +1 must be different from 
all of the p;. Adapt this argument to prove that for any field F there are infinitely many 
monic irreducible polynomials in F[x]. 


(partial fractions for polynomials) 


(a) Prove that every element of C(x) x can be written as a sum of a polynomial and a 
linear combination of functions of the form 1/(x — a)'. 


(b) Exhibit a basis for the field C(x) of rational functions as vector space over C. 


Prove that the following rings are Euclidean domains. 

(a) Z[w], w = e™/3, (b) Z[V-2]. 

Let a and b be integers. Prove that their greatest common divisor in the ring of integers 
is the same as their greatest common divisor in the ring of Gauss integers. 

Describe a systematic way to do division with remainder in Z[i]. Use it to divide 4 + 36i 
by 5 +i. 

Let F be a field. Prove that the ring F[x, x!] of Laurent polynomials (Chapter 11, 
Exercise 5.7) is a principal ideal domain. 


Prove that the ring R[[t]] of formal power series (Chapter 11, Exercise 2.2) is a unique 
factorization domain. 


Section3 Gauss’s Lemma 


3.1. Let g denote the homomorphism Z[x] —> R defined by 


3.2. 


3.3. 
3.4. 


3.5. 


(a) g(x) = 14 V2, (b) p(x) = 44 v2. 
Is the kernel of g a principal ideal? If so, find a generator. 


Prove that two integer polynomials are relatively prime elements of Q[x] if and only if 

the ideal they generate in Z[x] contains an integer. 

State and prove a version of Gauss’s Lemma for Euclidean domains. 

Let x, y, 2, w be variables. Prove that x y— zw, the determinant of a variable 2 X2 matrix, 

is an irreducible element of the polynomial ring C[x, y, z, w]. 

(a) Consider the map w:C[x, y] > C[t] defined by f(x, y) ~» f(t?, 8). Prove that its 
image is the set of polynomials p(t) such that # (0) = 0, 

(b) Consider the map g:C[x, y] > C[r] defined by f(x, y) ~ (#2 —1, 23 — £). Prove that 
ker@ is a principal ideal, and find a generator g(x, y) for this ideal. Prove that the 


image of ¢ is the set of polynomials p(t) such that p(0) = p(1). Give an intuitive 
explanation in terms of the geometry of the variety {g = 0} in C?. 
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3.6. 


Let a be a complex number. Prove that the kernel of the substitution map Z[x] > C that 
sends x ~» @ is a principal ideal, and describe its generator. 


Section 4 Factoring Integer Polynomials 


4.1. 
4.2. 


43. 


4.4. 
4.5. 


4.6. 
4.7. 
4.8. 


4.9, 
4.10. 


4.11. 


4.12. 


4.13. 


4.14. 


4.15. 


4.16. 
4.17. 


(a) Factor x? — x and x? — 1in F3[x]. (b) Factor x!© — x in F2[x]. 

Prove that the following polynomials are irreducible: 

(a) x? +1, in Fy[x], (b) x3 —9, in F3,[x]. 

Decide whether or not the polynomial x* + 6x3 + 9x +3 generates a maximal ideal 

in Q[x]. 

Factor the integer polynomial x° + 2x* + 3x3 + 3x +5 modulo 2, modulo 3, and in Q. 
Which of the following polynomials are irreducible in Q[x]? 

(a) x2 +27x +213, (b) 8x9 —6x4+1, (c) x9 4+6x741, (d) x — 3x4 43. 

Factor x° + 5x + 5 into irreducible factors in Q[x] and in F [x]. 

Factor x3 + x +1in F [x], when p = 2, 3, and S. 

How might a polynomial f(x) = x4+bx* + cwithcoefficientsina field F factor in F[x]? 
Explain with reference to the particular polynomials x* + 4x? + 4 and x4 + 3x? + 4. 

For which primes p and which integers n is the polynomial x” — p irreducible in Q[x]? 
Factor the following polynomials in Q[x]. (a) x* + 2351x + 125, (b) x3 4+ 2x? +3x+4+1, 
(c) x4 + 2x3 + 2x? + 2x + 2, (d) x4 + 2x3 + 3x? 42x +1, (e) x44 2x3 +x? 42x41, 
(f)x4+2x2 +x +1, (g) x8 4x6 4x4 + x2 4-1, (h) x6 — 2x5 — 3x? 4 9x —3, (f) x4 +x? +1, 
(k) 3x5 + 6x4 +933 43x? -1, Mx txt4x27 4442. 

Use the sieve method to determine the primes <100, and discuss the efficiency of the 
sieve: How quickly are the nonprimes filtered out? 


Determine: 
(a) the monicirreducible polynomials of degree 3 over F3, 


(b) the monic irreducible polynomials of degree 2 over Fs, 
(c) the number of irreducible polynomials of degree 3 over the field Fs. 


Lagrange interpolation formula: 


(a) Letap, ..., ag be distinct complex numbers. Determine a polynomial p(x) of degree 
n, which has aj, ..., @, as roots, and such that p(a@p) = 1. 


(b) Let ap, ...,ag and bo, ..., bg be complex numbers, and suppose that the aj; are 
distinct. There is a unique polynomial g of degree < d such that g(a;) = b; for each 
i=0,...,d. Determine the polynomial g explicitly in terms of a; and bj. 


By analyzing the locus x? + y? = 1, prove that the polynomial x? + y* — 1 is irreducible 
in C[x, y]. 

With reference to the Eisenstein criterion, what can one say when 

(a) f is constant, (b) f =x" 4+ bx"-19 

Factor x!4 + 8x3 + 3in Q[x], using reduction modulo 3 as a guide. 

Using congruence modulo 4 as an aid, factor x4 + 6x3 + 7x? + 8x + 9 in Q[x]. 


*4,18. 


4.19, 
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Let g = p® with p prime, and let r = p®~!. Prove that the cyclotomic polynomial 
(x? — 1)/(x" — 1) is irreducible. 


Factor x° — x4 — x2 — 1 modulo 2, modulo 16, and over Q. 


SectionS Gauss Primes 


5.1. 
5.2. 


5.3. 
5.4. 


5.5. 


5.6. 


5.7. 
5.8. 


*5.9. 


Factor the following into primes in Z[i]: (a) 1—3z, (b) 10, (c)6+97, (d)7 +2. 

Find the greatest common divisor in Z[iJof (a)11+7i,4+7i, (b) 11+ 7i, 8+, 

(c) 3+ 4%, 18 — 1. 

Find a generator for the ideal of Z[i] generated by 3 + 4i and 4 + 7i. 

Make a neat drawing showing the primes in the ring of Gauss integers in a reasonable 
size range. 

Let z be a Gauss prime. Prove that 2 and 7 are associates if and only if zr is an associate 
of an integer prime, or 77 = 2. 

Let R be the ring Z[/—3]. Prove that an integer prime p is a prime element of R if and 
only if the polynomial x? + 3 is irreducible in F ,[x]. 

Describe the residue ring Z[i]/(p) for each prime p. 

Let R = Z[w], where w = e””/3, Make a drawing showing the prime elements of absolute 
value < 10in R. 

Let R = Z[w], where w = e27'/3, Let p be an integer prime #3. Adapt the proof of 
Theorem 12.5.2 to prove the following: 

(a) The polynomial x? + x + 1 has a root in F, if and only if p=1 modulo 3. 

(b) (p) is a maximal ideal of R if and only if p=-1 modulo 3. 


(c) p factorsin R if and only if it can be written in the form p = a* + ab + b’, for some 
integers a and b. 


5.10. (a) Let a be a Gauss integer. Assume that a has no integer factor, and that @a is a 


square integer. Prove that a@ is a square in Z[i]. 
(b) Let a, b, c be integers such that a and bare relatively pune and a* + b* = c’, Prove 
that there are integers m and n such that a = m2 — n*,b =2mn, and c = m2 +n’. 


Miscellaneous Problems 


M.1. 


M.2. 


Let S be a commutative semigroup — a set with a commutative and associative law 
of composition and with an identity element (Chapter 2, Exercise M.4). Suppose the 
Cancellation Law holds in S: If ab = ac then b = c. Make the appropriate definitions 
and extend Proposition 12.2.14(a) to this situation. 


Let v4,..., Un be elements of Z?, and let S be the semigroup of all combinations 
av, +-+++QnUn With non-negative integer coefficients a;, the law of composition being 
addition (Chapter 2, Exercise M.4). Determine which of these semigroups has unique 
factorization (a) when the coordinates of the vectors vj are nonnegative, and (b) in 
general. 

Hint: Begin by translating the terminology (12.2.1) into additive notation. 


Suggested by Nathaniel Kuhn. 
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M.3. 


*M.4. 


M.5. 
M.6. 


MLB. 


M.9. 


M.10. 


M.11. 


Let p be an integer prime, and let A be an n Xn integer matrix such that A? = J but 
A#I. Prove thatn > p — 1. Give an example with n = p—1. 


(a) Let R be the ring of functions that are polynomials in cost and sin?, with real 
coefficients. Prove that R is isomorphic to R[x, y]/(x? + y* — 1). 
(b) Prove that R is not a unique factorization domain. 


(c) Prove that S = C[x, y]/(x? + y* — 1) is a principal ideal domain and hence a unique 
factorization domain. 


(d) Determine the units in the rings S and R. 
Hint: Show that S is isomorphic to a Laurent polynomial ring C[u, u~}]. 
For which integers n does the circle x* + y* = n contain a point with integer coordinates? 


Let R be a domain, and let 7 be an ideal that is a product of distinct maximal ideals in 
two ways, say = P;--- P; = Q1--- Qs. Prove that the two factorizations are the same, 
except for the ordering of the terms. 


. Let R = Z[x}. 


(a) Prove that every maximal ideal in R has the form (p, f), where pis an integer prime 
and f is a primitive integer polynomial that is irreducible modulo p. 

(b) Let J be an ideal of R generated by two polynomials f and g that have no common 
factor other than +1. Prove that R// is finite. 


Let u and v be relatively prime integers, and let R’ be the ring obtained from Z by 
adjoining an element @ with the relation va = u. Prove that R’ is isomorphic to Z[¥] 


and also to Z[ +}. 


Let R denote the ring of Gauss integers, and let W be the R-submodule of V = R? 
generated by the columns of a 2X2 matrix with coefficients in R. Explain how to determine 
the index [V: W]. 

Let f and g be polynomials in C[x, y] with no common factor. Prove that the ring 
R=C[x, y]/Cf, g) isa finite-dimensional vector space over C. 


(Berlekamp’s method) The problem here is to factor efficiently in F2[x]. Solving linear 
equations and finding a greatest common divisor are easy compared with factoring. The 
derivative f’ of a polynomial f is computed using the rule from calculus, but working 
modulo 2. Prove: 


(a) (square factors) The derivative f’ is a square, and f’ = 0 if and onlyif f is a square. 
Moreover, gcd(f, f’) is the product of powers of the square factors of f. 

(b) (relatively prime factors) Let n be the degree of f. If f = uv, where u and v are 
relatively prime, the Chinese Remainder Theorem shows that there is a polynomial 
g of degree at most n such that g? — g=0 modulo f, and g can be found by solving 
a system of linear equations. Either gcd(f, g) or gcd(f, g — 1) will be a proper 
factor of f. 


(c) Use this method to factor x? + x® + x4 41. 


CHAPTER = 13 


Quadratic Number Fields 


Rien n’est beau que le vrai. 


—Hermann Minkowski 


In this chapter, we see how ideals substitute for elements in some interesting rings. We will 
use various facts about plane lattices, and in order not to break up the discussion, we have 
collected them together in Section 13.10 at the end of the chapter. 


13.1 ALGEBRAIC INTEGERS 


A complex number a that is the root of a polynomial with rational coefficients is called an 
algebraic number. The kernel of the substitution homomorphism g:Q[x] > C that sends x 
to an algebraic number co is a principal ideal, as are all ideals of Q[x]. It is generated by the 
monic polynomial of lowest degree in Q[x] that has @ as a root. If @ is a root of a product 
gh of polynomials, then it is a root of one of the factors. So the monic polynomial of lowest 
degree with root @ is irreducible. We call this polynomial the irreducible polynomial for a 
over Q. 


e An algebraic number is an algebraic integer if its (monic) irreducible polynomial over Q 
has integer coefficients. 


The cube root of unity w = e27'/3 = 5(-1 + i) is an algebraic integer because its 
irreducible Poe over Q is x7 +.x+1, while a = 5 el + /3 ) isa root of the irreducible 
polynomial x? — x — 5 1 and is not an algebraic integer. 


Lemma 13.1.1 A rational number is an algebraic integer if and only if it is an ordinary integer. 


This is true because the irreducible polynomial over Q for a rational number ais x-—a. O 
A quadratic number field is a field of the form Q[/d], where d is a fixed integer, 
positive or negative, which is not a square in Q. Its elements are the complex numbers 


(13.1.2) a+bVd, witha andb inQ, 


The notation Vd stands for the positive real square root if d > 0 and for the positive 
imaginary square root if d < 0. The field Q[/d] is a real quadratic number field if d > 0, and 
an imaginary quadratic number field if d < 0. 
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If dhas a square integer factor, we can pull it out of the radical without changing the 
field. So we assume d square-free. Then d can be any one of the integers 


d = -1, +2, +3, +5, +6, +7, +10,... 


We determine the algebraic integers in a quadratic number field Q[/d] now. Let 5 
denote Jd, let a = a + bd be an element of Q[4] that is not in Q, that is, with b+ 0, and let 
a’ = a ~ bé. Thena and a’ are roots of the polynomial 


(13.1.3) (x - a’)(x — a) = x? — 2ax + (a* — b*d), 


which has rational coefficients. Since @ is not a rational number, it is not the root of a linear 
polynomial. So this quadratic polynomial is irreducible over Q. It is therefore the irreducible 
polynomial for a over Q. 


Corollary 13.1.4 A complex number a@ = a + bé with a and b in Q is an algebraic integer if 
and only if 2a and a? — b?d are ordinary integers. O 


This corollary is also true when b = 0 anda =a. 
The possibilities for a and b depend on congruence modulo 4. Since d is assumed to be 
square free, we can’t have d=0, so d=1, 2, or 3 modulo 4. 


Lemma 13.1.5 Let d be a square-free integer, and let r be a rational number. If rd is an 
integer, then r is an integer. 


Proof. The square-free integer d cannot cancel a square in the denominator of r?. O 


A half integer is a rational number of the form m + 7 where m is an integer. 


Proposition 13.1.6 The algebraic integers in the quadratic field Q[6], with 6? = d and d 
square free, have the form a = a + bé, where: 


¢ If d=2 or3 modulo 4, then a and b are integers. 
¢ If d=1 modulo 4, then a and Dare either both integers, or both half integers. 


The algebraic integers form aring R, the ring of integers in F. 


Proof, We assume that 2a and a? — bd are integers, and we analyze the possiblities for a 
and b. There are two cases: Either a is an integer, or a is a half integer. 


Case 1: a is an integer. Then b*d must be an integer. The lemma shows that b is an integer. 


Case 2:a=m+ i is a half integer. Then a? = m2 +m + i will be in the set Z + i, Since 
a’ — bd is an integer, bd is also in Z + ie Then 4b2d is an integer and the lemma shows 
that 2b is an integer. So b is a half integer, and then bd is in the set Z+ i if and only if d=1 
modulo 4. 

The fact that the algebraic integers form a ring is proved by computation. O 
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The imaginary quadratic case d < 0 is easier to handle than the real case, so we 
concentrate on it in the next sections. When d <0, the algebraic integers form a lattice in the 
complex plane. The lattice is rectangular if d=2 or 3 modulo 4, and “‘isosceles triangular” if 


d=1 modulo 4. 
When d = -1, R is the ring of Gauss integers, and the lattice is square. When d = -3, 
the lattice is equilateral triangular. Two other examples are shown below. 


. . . a . . 2 * = = 


d=-5 d=-7 
(13.1.7) Integers in Some Imaginary Quadratic Fields. 


Being a lattice is a very special property of the rings that we consider here, and the geometry 
of the lattices helps to analyze them. 

When d=2 or 3 modulo 4, the integers in Q[6] are the complex numbers a + bé, with 
a and b integers. They form a ring that we denote by Z[6]. A convenient way to write all the 
integers when d=1 modulo 4 is to introduce the algebraic integer 


(13.1.8) n= Ae +6). 
It is a root of the monic integer polynomial 
(13.1.9) aK PR: 


where h = (1 — d)/4. The algebraic integers in Q[6] are the complex numbers a + bn, with 
a and b integers. The ring of integers is Z [7]. 


13.2 FACTORING ALGEBRAIC INTEGERS 


The symbol R will denote the ring of integers in an imaginary quadratic number field Q[6]. 
To focus your attention, it may be best to think at first of the case that d is congruent 2 or 3 
modulo 4, so that the algebraic integers have the form a + b6, with a and b integers. 
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When possible, we denote ordinary integers by Latin letters a, b,..., elements of R 
by Greek letters a, 6, ..., and ideals by capital letters A, B,... We work exclusively with 
nonzero ideals. 

Ifa =a+b6is in R, its complex conjugate @ = a — bé is in R too. These are the roots 
of the polynomial x* — 2ax + (a? — b*d) that was introduced in Section 13.1. 


¢ The normof a=a+bé is N(a) = aa. 


The norm is equal to |a|* and also to a” — b’d. It is a positive integer for all a +0, and it has 
the multiplicative property: 


(13.2.1) N(By) = N(B) Ny). 


This property gives us some control of the factors of an element. If a = By, then both terms 
on the right side of (13.2.1) are positive integers. To check for factors of @, it is enough to 
look at elements 6 whose norms divide the norm of a. This is manageable when M(q) is 
small. For one thing, it allows us to determine the units of R. 


Proposition 13.2.2 Let R be the ring of integers in an imaginary quadratic number field. 


e Anelement a of R is a unit if and only if N(a@) = 1. If so, then a! =@. 
e The units of R are {+1} unless d = -1 or -3. 
e When d = -1, R is the ring of Gauss integers, and the units are the four powers of i. 


+ When d = -3, the units are the six powers of e?7//6 = 3(1 + V-3). 


Proof. If a is a unit, then N(@) N(a~!) = N(1) = 1. Since N(@) and N(a!) are positive 
integers, they are both equal to 1. Conversely, if N(@) = @a = 1, then @ is the inverse of a, 
so @ is a unit, The remaining assertions follow by inspection of the lattice R. Oo 


Corollary 13.2.3 Factoring terminates in the ring of integers in an imaginary quadratic 
number field. 


This follows from the fact that factoring terminates in the integers. If a = By is a proper 
factorization in R, then N(a~) = N(B) N()) is a proper factorization in Z. 


Proposition 13.2.4 Let R be the ring of integers in an imaginary quadratic number field. 
Assume that d=3 modulo 4. Then R is not a unique factorization domain except in the case 
d = -1, when R is the ring of Gauss integers. 


Proof. This is analogous to what happens when d = -5. Suppose that d=3 modulo 4 and 
that d<-1. The integers in R have the form a+ bé which a, b € Z, and the units are +1. Let 
e = (1 —-d)/2. Then 

2e =1-—d=(1+46)(1-5). 


The element 1 — d factors in two ways in R. Since d < -1, there is no element a + bd whose 
norm is equal to 2. Therefore 2, which has norm 4, is an irreducible element of R. If R were 
a unique factorization domain, 2 would divide either 1 + 6 or 1 — 6 in R, which it does not: 
(1 +6) is not an element of R when d=3 modulo 4. a) 
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There is a similar statement for the case d=2 modulo 4. (This is Exercise 2.2.) But 
note that the reasoning breaks down when d=1 modulo 4. In that case, $(1 +6) isin R, and 
in fact there are more cases of unique factorization when d=1 modulo 4. A famous theorem 
enumerates these cases: 


Theorem 13.2.5 The ring of integers R in the imaginary quadratic field Q[Vd] is a unique 
factorization domain if and only if d is one of the integers -1, -2,-3,-7,-11,-19, -43, -67, -163. 


Gauss proved that for these values of d, R has unique factorization. We will learn how to do 
this. He also conjectured that there were no others. This much more difficult part of the theo- 
rem was finally proved by Baker, Heegner, and Stark in the middle of the 20th century, after 
people had worked on it for more than 150 years. We won’t be able to prove their theorem. 


13.3 IDEALS IN Z[/—5] 


Before going to the general theory, we describe the ideals in the ring R = Z[V-5] as lattices 
in the complex plane, using an ad hoc method. 


Proposition 13.3.1 Let R be the ring of integers in an imaginary quadratic number field. 
Every nonzero ideal of R is a sublattice of the lattice R. Moreover, 

¢ If d=2 or 3 modulo 4, a sublattice A is an ideal if and only if 5A C A. 

¢ If d2=1 modulo 4,a sublattice A is an ideal if and only if 7A C A (see (13.1.8)). 


Proof. A nonzero ideal A contains a nonzero element a, and (a@, wd) is an independent set 
over R. Also, A is discrete because it is a subgroup of the lattice R. Therefore A is a lattice 
(Theorem 6.5.5). 

To be an ideal, a subset of R must be closed under addition and under multiplication 
by elements of R. Every sublattice A is closed under addition and multiplication by integers. 
If A is also closed under multiplication by 6, then it is closed under multiplication by an 
element of the form a + bé, with a and b integers. This includes all elements of R if d=2 or 
3 modulo 4. So A is an ideal. The proof in the case d=1 modulo 4 is similar. O 


We describe ideals in the ring R = Z[5], when 5* = -S. 


Lemma 13.3.2 Let R = Q[4] with 5° = -5. The lattice A of integer combinations of 2 and 
1 + dis an ideal. 


Proof. The lattice A is closed under multiplication by 6, because 6-2 and 6- (1 + 4) are 
integer combinations of 2 and1+ 6. O 


Figure 13.3.4 shows this ideal. 


Theorem 13.3.3 Let R = Z[6], where 6 = /-5, and let A be a nonzero ideal of R. Let a be 
a nonzero element of A of minimal norm (or minimal absolute value). Then either 


e The set (a, aw) is a lattice basis for A, and A is the principal ideal (a), or 
e The set (a, 5(a + @5)) is a lattice basis for A, and A is not a principal ideal. 
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This theorem has the following geometric interpretation: The lattice basis (a, ad) 
of the principal ideal (a) is obtained from the lattice basis (1, 5) of the unit ideal R by 
multiplying by a. If we write a in polar coordinates a = re’, then multiplication by a 
rotates the complex plane through the angle 6 and stretches by the factor r. So all principal 
ideals. are similar geometric figures. Also, the lattice with basis (a, 5(a + a6)) is obtained 
from the lattice (2, 1 + 5) by multiplying by kev. All ideals of the second type are geometric 
figures similar to the one shown below (see also Figure 13.7.4). 


7 oe © #& © *& © * © #* 8 


* * « * * a * . * 
. * * . * . * . * 
a ee x 8 Ok 


° * . * . * . * ° * . 
(13.3.4) The Ideal (2, 1 + 5) in the Ring Z[W-5]. 


Similarity classes of ideals are called ideal classes, and the number of ideal classes is the 
class number of R. The theorem asserts that the class number of Z[/-5] is two. Ideal classes 
for other quadratic imaginary fields are discussed in Section 13.7. 


Theorem 13.3.3 is based on the following simple lemma about lattices: 


Lemma 13.3.5 Let A be a lattice in the complex plane, let r be the minimum absolute value 
among nonzero elements of A, and let y be an element of A. Let n be a positive integer. 
The interior of the disk of radius iy about the point ty contains no element of A other than 
the center ty. The center may lie in A or not. 


Proof. If B is an element of A in the interior of the disk, then |B — yl < tr, which is to 
say, |2B — y| <r. Moreover, nB — y isin A. Since this is an element of absolute value less 
than the minimum, nf ~— y = 0. Then B = ty is the center of the disk. O 


Proof of Theorem 13.3.3. Let a bea nonzero element of an ideal A of minimal absolute value 
r.Since A contains a, it contains the principal ideal (@), andif A = (a) we are in the first case. 


Suppose that A contains an element # not in the principal ideal (a). The ideal (a) has 
the lattice basis B = (a, a5), so we may choose £ to lie in the parallelogram 11 (B) of linear 
combinations ra + saé with 0 < r, s < 1. (In fact, we can choose f so that 0 < r, s < 1. See 
Lemma 13.10.2.) Because 6 is purely imaginary, the parallelogram is a rectangle. How large 
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the rectangle is, and how it is situated in the plane, depend on a, but the ratio of the side 
lengths is always 1: /5. We’ll be done if we show that is the midpoint 5 (a + ad) of the 
rectangle. 


Figure 13.3.6 shows disks of radius r about the four vertices of such a rectangle, and 
also disks of radius xr about three half lattice points, Sad, 5 (a +a6),anda@+ 508. Notice 
that the interiors of these seven disks cover the rectangle. (It would be fussy to check this by 
algebra. Let’s not bother. A glance at the figure makes it clear enough.) 


According to Lemma 13.3.5, the only points of the interiors of the disks that can be 
elements of A are their centers. Since Bis notin the principal ideal (q), it is not a vertex of the 
rectangle. So 8 must be one of the three half lattice points. If B = a+ S08, then since @ is in 


A, 505 will be in A too. So we have only two cases to consider: 8 = Saud and B = (a +a). 


This exhausts the information we can get from the fact that A isa lattice. We now use the 
factthat A is an ideal. Suppose that 50d isin A. Multiplying by 6 shows that 50d? = 3a isin 
A. Then since @ is in A, a isin A too. This contradicts our choice of aw as anonzeroelement 
of minimal absolute value. So B cannot be equal to sad. The remaining possibility is that B 
is the center AG + ad) of the rectangle. If so, we are in the second case of the theorem. O 


13.4 IDEAL MULTIPLICATION 


Let R be the ring of integers in an imaginary quadratic number field. As usual, the notation 
A = (a, B,..., y) means that A is the the ideal of R generated by the elements a, B,..., y. 
It consists of all linear combinations of those elements, with coefficients in the ring. 
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Since a nonzero ideal A is a lattice, it has a lattice basis (a, 8) consisting of two 
elements. Every element of A is an integer combination of a and 8. We must be careful to 
distinguish between the concepts of a lattice basis and a generating set for an ideal. Any 
lattice basis generates the ideal, but the converse is false. For instance, a principal ideal is 
generated as an ideal by a single element, whereas a lattice basis has two elements. 

Dedekind extended the notion of divisibility to ideals using the following definition of 
ideal multiplication: 


e Let A and Bbeidealsinaring R. The productideal A B consists of all finite sums of products 
(13.4.1) >> aiB;, with ajin A and B;in B. 

i 
This is the smallest ideal of R that contains all of the products af. 


The definition of ideal multiplication may not be quite as simple as one might hope, 
but it works well. Notice that it is a commutative and associative law, and that it has a unit 
element, namely R. (This is one of the reasons that R is called the unit ideal.) 


(13.4.2) AB=BA, A(BC)=(AB)C, AR=RA=A. 
We omit the proof of the next proposition, which is true for arbitrary rings. 


Proposition 13.4.3 Let A and B be ideals of a ring R. 

(a) Let {a1,...,a@m} and {B;, ..., Bn} be generators for the ideals A and B, respectively. 
The product ideal A B is generated as ideal by the mn products a;6;: Every element of 
AB is a linear combination of these products with coefficients in the ring. 

(b) The product of principal ideals is principal: If A = (a) and B = (8), then AB is the 
principal ideal (#8) generated by the product af. 

(c) Assume that A = (q@) is a principal ideal and let B be arbitrary. Then AB is the set of 
products af with Bin B: AB=aB. O 


We go back to the example of the ring R = Z[4] with 5* = -5, in which 
(13.4.4) 2-3=6=(14+48)(1-5). 


If factoring in R were unique, there would be an element y in R dividing both 2 and 1 + 6, 
and then 2 and 1 + 6 would be in the principal ideal (jy). There is no such element. However, 
there is an ideal that contains 2 and 1 + 5, namely the ideal (2, 1+ 6) generated by these two 
elements, the one depicted in Figure 13.3.4. 

We can make four ideals using the factors of 6: 


(13.4.5) A=(2,1+8), A=(2,1-8, B=@,14+5, B=@G,1-8). 


In each of these ideals, the generators that are given happen to form lattice bases. We denote 
the last of them by B because it is the complex conjugate of B: 


(13.4.6) B= {B| Be B). 
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It is obtained by reflecting B about the real axis. The fact that R = R implies that the 
complex conjugate of an ideal is an ideal. The ideal A, the complex conjugate of A, is equal 
to A. This accidental symmetry of the lattice A doesn’t occur very often. 

We now compute some product ideals. Proposition 13.4.3(a) tells us that the ideal AA 
is generated by the four products of the generators (2, 1 — 6) and (2,1+6) of A and A: 


2434-98 9 = 98-6): 


Each of the four generators is divisible by 2, so AA is contained in the principal ideal (2). 
(The notation (2) stands for the ideal 2R here.) On the other hand, 2 is an element of AA 
because 2 = 6 —4. Therefore (2) C AA. This shows that AA = (2). 

Next, the product A B is generated by four products: 


= (6,2 +26, 3+ 36, (1 + 6)?). 


Each of these four elements is divisible by 1 + 6, and 1 + 6 is the difference of two of them, 
so it is an element of AB. Therefore A B is equal to the aoe ideal (1 + 5). One sees 
similarly that A B = (1 — 6) and that BB = (3). 

The principal ideal (6) is the product of four ideals: 


(13.4.7) (6) = (2)(3) = (AA)(BB) = (A B)(AB) = (1 — 8)(1+ 8) 


Isn’t this beautiful? The ideal factorization (6) = AA BB has provided a common refinement 
of the two factorizations (13.4.4). 


In the next section, we prove unique factorization of ideals in the ring of integers of 
any imaginary quadratic number field. The next lemma is the tool that we will need. 


Lemma 13.4.8 Main Lemma. Let R be the ring of integers in an imaginary quadratic number 
field. The product of anonzeroideal A of R and its conjugate A is a principal ideal, generated 
by a positive ordinary integer n: AA = (n) =nR. 


This lemma would be false for any ring smaller than R, for example, if one didn’t include 
the elements with half integer coefficients, when d=1 modulo 4. 


Proof. Let (a, B) be a lattice basis for the ideal A. Then (@, B) is a lattice basis for A. 
Moreover, A and A are generated as ideals by these bases, so the four products @a, a8, 
Ba, and Bf generate the product ideal AA. The three elements da, Bf, and Ba + @B are 
in AA. They are algebraic integers equal to their complex conjugates, so they are rational 
numbers, and therefore ordinary integers (13.1.1). Let n be their greatest common divisor in 
the ring of integers. It is an integer combination of those elements, so it is also an element of 
AA. Therefore (n) C AA If we show that n divides each of the four generators of AA in 
R, it will follow that (7) = AA, and this will prove the lemma. 


By construction, n divides @a and Bf in Z, hence in R. We have to show that n divides 
a@B and Ba. How can we do this? There is a beautiful insight here. We use the definition of 
an algebraic integer. If we show that the quotients y = @B/n and Y = Ba/n are algebraic 
integers, it will follow that they are elements of the ring of integers, which is R. This will 
mean that n divides @B and Ba in R. 
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The elements y and ¥ are roots of the polynomial p(x) = x* —- (7+ y)x+ (VV): 


. pa+ap _  paaB aafpp 
are and pyar ee 
mae n ae ae non 
By its definition, n divides each of the three integers Ba + &@B, @a, and Bf. The coefficients 
of p(x) are integers, so y and ¥ are algebraic integers, as we hoped. (See Lemma 12.4.2 for 


the case that y happens to be a rational number.) O 


Our first applications of the Main Lemma are to divisibility of ideals. In analogy with 
divisibility of elements of a ring, we say that an ideal A divides another ideal B if there is an 
ideal C such that B is the product ideal AC. 


Corollary 13.4.9 Let R be the ring of integers in an imaginary quadratic number field. 


(a) Cancellation Law: Let A, B, C be nonzero ideals of R. Then AB = AC if and only if 
B=C. Similarly, AB C AC, if and onlyif BC C,and AB < AC if and onlyif B< C. 

(b) Let A and B be nonzero ideals of R. Then A > Bif and onlyif A divides B, i.e., if and 
only if there is an ideal C such that B = AC. 


Proof. (a) It is clear that if B = C, then AB = AC. If AB = AC, then AAB = AAC. By 
the Main Lemma, AA = (n), sonB = nC. Dividing by n shows that B = C. The other 
assertions are proved in the same way. 


(b) We first consider the case that a principal ideal (”) generated by an ordinary integer n 
contains an ideal B. Then n divides every element of B in R. Let C = n™'B be the set of 
quotients, the set of elements n~!8 with 6 in B. You can check that C is an ideal and that 
nC = B. Then Bis the product ideal (”)C, so (n) divides B. 

Now suppose that an ideal A contains B. We apply the Main Lemma again: AA = (n). 
Then (n) = AA contains AB. By what has been shown, there is an ideal C such that 
AB = (n)C = AAC. By the Cancellation Law, B = AC. 


Conversely, if A divides B, say B= AC, then B= ACC AR=A. 0 


13.5 FACTORING IDEALS 


We show in this section that nonzero ideals in rings of integers in imaginary quadratic fields 
factor uniquely. This follows rather easily from the Main Lemma 13.4.8 and its Corollary 
13.4.9, but before deriving it, we define the concept of a prime ideal. We do this to be consistent 
with standard terminology: the prime ideals that appear are simply the maximal ideals. 


Proposition 13.5.1 Let R be a ring. The following conditions on an ideal P of R are 
equivalent. An ideal that satisfies these conditions is called a prime ideal. . 

(a) The quotient ring R/P is an integral domain. 

(b) P#R, and if a and b are elements of R such that ab € P, thena € Por be P. 

(c) P#R, and if A and B are ideals of R such that ABC P,then AC Por BCP. 


Condition (b) explains the term ‘‘prime.”’ It mimics the important property of a prime 
integer, that if a prime p divides a product ab of integers, then p divides a or p divides b. 
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Proof. (a) <=> (b): The conditions for R/P to be an integral domain are that R/P+#{0} 
and ab = O implies a = 0 or b = 0. These conditions translate to P# R and ab € P implies 
ae PorbeP. 


(b) => (c): Suppose that ab € P implies a € P or b € P, and let A and B be ideals such that 
ABCP.If A ¢ P, there is an element a in A that isn’t in P. Let b be any element of B. 
Then ab is in A B and therefore in P. But a is not in P, so b is in P. Since b was an arbitrary 
element of B, BC P. 


(c) => (b): Suppose that P has the property (c), and let a and b be elements of R such that 
ab is in P. The principal ideal (ab) is the product ideal (a)(b). If ab € P, then (ab) C P, 
and so (a) C P or (b) CP. This tells us that a € P or be P. O 


Corollary 13.5.2 Let R be a ring. 
(a) The zero ideal of R is a prime ideal if and only if R is an integral domain. 


(b) A maximal ideal of R is a prime ideal. 
(c) A principal ideal (@) is a prime ideal of R if and only if @ is a prime element of R. 


Proof. (a) This follows from (13.5.1)(a), because the quotient ring R/(0) is isomorphic to R. 


(b) This also follows from (13.5.1)(a), because when M is a maximal ideal, R/™M is a field. 
A field is an integral domain, so M is a prime ideal. Finally, (c) restates (13.5.1)(b) for a 
principal ideal. O 


This completes our discussion of prime ideals in arbitrary rings, and we go back to the 
ring of integers in an imaginary quadratic number field. 


Corollary 13.5.3 Let R be the ring of integers in an imaginary quadratic number field, let A 
and B be ideals of R, and let P be a prime ideal of R that is not the zero ideal. If P divides 
the product ideal AB, then P divides one of the factors A or B. 


This follows from (13.5.1)(c) when we use (13.4.9)(b) to translate inclusion into divisibility.0 


Lemma 13.5.4 Let R be the ring of integers in an imaginary quadratic number field, and let 
B be a nonzero ideal of R. Then 

(a) Bhas finite index in R, 

(b) there are finitely many ideals of R that contain B, 

(c) Bis contained in a maximal ideal, and 

(d) Bisa prime ideal if and only if it is a maximal ideal. 


Proof. (a) is Lemma 13.10.3(d), and (b) follows from Corollary 13.10.5 
(c) Among the finitely many ideals that contain B, there must be at leastone that is maximal. 


(d) Let P be a nonzero prime ideal. Then by (a), P has finite index in R. So R/P isa 
finite integral domain. A finite integral domain is a field. (This is Chapter 11, Exercise 7.1.) 
Therefore Pis a maximal ideal. The converse is (13.5.2)(b). O 
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Theorem 13.5.5 Let R be the ring of integers in an imaginary quadratic field F. Every 
proper ideal of R is a product of prime ideals. The factorization of an ideal into prime ideals 
is unique except for the ordering of the factors. 


Proof. If an ideal B is a maximal ideal, it is itself a prime ideal. Otherwise, there is an ideal 
A that properly contains B. Then A divides B, say B = AC. The cancellation law shows 
that C properly contains B too. We continue by factoring A and C. Since only finitely many 
ideals contain B, the process terminates, and when it does, all factors will be maximal and 
therefore prime. 

If Py---P, = Q1---Qs, with P; and Q; prime, then P, divides Q;--. Qs, and 
therefore P; divides one of the factors, say Q;. Then P; contains Q;, and since Q; is 
maximal, P} = Q). The uniqueness of factorization follows by induction when one cancels 
P; from both sides of the equation. O 


Note: This theorem extends to rings of algebraic integers in other number fields, but it is a 
very special property. Most rings do not admit unique factorization of ideals. The reason is 
that in most rings, P > B does not imply that P divides B, and then the analogy between 
prime ideals and prime elements is weaker. O 


Theorem 13.5.6 The ring of integers R in an imaginary quadratic number field is a unique 
factorization domain if and only if it is a principal ideal domain, and this is true if and only if 
the class group C of R is the trivial group. 


Proof. A principal ideal domain is a unique factorization domain (12.2.14). Conversely, 
suppose that R is a unique factorization domain. We must show that every ideal is principal. 
Since the product of principal ideals is principal and since every nonzero ideal is a product 
of prime ideals, it suffices to show that every nonzero prime ideal is principal. 


Let P be a nonzero prime ideal of R, and let a be a nonzero element of P. Then @ is 
a product of irreducible elements, and because R has unique factorization, they are prime 
elements (12.2.14). Since P is a prime ideal, P contains one of the prime factors of a, say 7. 
Then P contains the principal ideal (sr). But since z is a prime element, the principal ideal 
(zr) is a nonzero prime ideal, and therefore a maximal ideal. Since P contains (77), P = (7). 
So P is a principal ideal. O 


13.6 PRIME IDEALS AND PRIME INTEGERS 


In Section 12.5, we saw how Gauss primes are related to integer primes. A similar analysis 
can be made for the ring R of integers in a quadratic number field, but we should speak of 
prime ideals rather than of prime elements. This complicates the analogues of some parts of 
Theorem 12.5.2. We consider only those parts that extend directly. 


Theorem 13.6.1 Let R be the ring of integers in an imaginary quadratic number field. 


(a) Let P be a nonzero prime ideal of R. Say that PP = (n) wheren isa positive integer. 
Then v is either an integer prime or the square of an integer prime. 

(b) Let p be an integer prime. The principal ideal (p) = pR is either a prime ideal, or the 
product PP of a prime ideal and its conjugate. 
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(c) Assume that d=2 or 3 modulo 4. An integer prime p generates a prime ideal (p) of R 
if and only if d is not a square modulo p, and this is true if and only if the polynomial 
x? — d is irreducible in F p[x]. 

(d) Assume that d2=1 modulo 4, and let h = 41 —d). An integer prime p generates a 
prime ideal (p) of R if and only if the polynomial x? — x + h is irreducible in F pix). 


Corollary 13.6.2 With the notation as in the theorem, any proper ideal strictly larger than 
(p) is a prime, and therefore a maximal, ideal. O 


e An integer prime p is said to remain prime if the principal ideal (p) = pR isa prime ideal. 
Otherwise, the principal ideal (p) is a product PP of a prime ideal and its conjugate, and in 
this case the prime p is said to split. If in addition P = P, the prime p is said to ramify. 


___ Going back to the case d = -5, the prime 2 ramifies in Z[Vv-5] because (2) = AA and 
A =A. The prime 3 splits. It does not ramify, because (3) = BB and B+ B (see (13.4.5)). 


Proof of Theorem 13.6.1. The proof follows that of Theorem 12.5.2 closely, so we omit the 
proofs of (a) and (b). We discuss (c) in order to review the reasoning. Suppose d=2 or 3 
modulo 4. Then R = Z[6] is isomorphic to the quotient ring Z[x]/(x? — d). A prime integer 
p remains prime in R if and only if R = R/(p) isa field. (We are using a tilde here to avoid 
confusion with complex conjugation.) This leads to the diagram 


keel 
(13.6.3) Z[x] —"> Fplx] 
kernel kernel 
(x? — d) (x? - d) 
Ze) kernel R 
(Pp) 


This diagram shows that R is a field if and only if x? — d is irreducible in F pix]. 
The proof of (d) is similar. O 


Proposition 13.6.4 Let A, B, C be nonzero ideals with B > C. The index [B: C] of C in B 
is equal to the index [A B: AC]. 


Proof. Since A is a product of prime ideals, it suffices to show that [B:C] = [PB: PC] when 
P is anonzero prime ideal. The lemma for an arbitrary ideal A follows when we multiply by 
one prime ideal at a time. 


There is a prime integer p such that either P = (p) or PP = (p) (13.6.1). If P is the 
principal ideal (p), the formula to be shown is [B:C] = [pB: pC], and this is rather obvious 
(see (13.10.3)(c)). 


Suppose that (p) = PP. We inspect the chain of ideals B > PB > PPB = pB. 
The cancellation law shows that the inclusions are strict, and [B: pB] = p*. Therefore 
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[B: PB] = p. Similarly, [C: PC] = p (13.10.3)(b). The diagram below, together with the 
multiplicative property of the index (2.8.14), shows that [B:C] =[PB: PC]. 


Boop C 
U U 
PB »d PC 


13.7 IDEAL CLASSES 


As before, R denotes the ring of integers in an imaginary quadratic number field. We 
have seen that R is a principal ideal domain if and only if it is a unique factorization 
domain (13.5.6). We define an equivalence relation on nonzero ideals that is compatible with 
multiplication of ideals, and such that the principal ideals form one equivalence class. 


e Two nonzero ideals A and A’ of R are similar if, for some complex number A, 
(13.7.1) A’=AA. 


Similarity of ideals is an equivalence relation whose geometric interpretation was mentioned 
before: A and A’ are similar if and only if, when regarded as lattices in the complex plane, they 
are similar geometric figures, by a similarity that is orientation-preserving. To see this, we 
note that a lattice looks the same at all of its points. So a geometric similarity can be assumed 
to relate the element 0 of A to the element 0 of A’. Then it will be described as a rotation 
followed by a stretching or shrinking, that is, as multiplication by a complex number A. 


¢ Similarity classes of ideals are called ideal classes. The class of an ideal A will be denoted 
by (A). 


Lemma 13.7.2 The class (R) of the unit ideal consists of the principal ideals. 


Proof. If (A) = (R), then A = AR for some complex number A. Since 1 is in R, A is an 
element of A, and therefore an element of R. Then A is the principal ideal (A). O 


We saw in (13.3.3) that there are two ideal classes in the ring R = Z[6], when 5? = -5. 
Both of the ideals A = (2,1+ 6) and B = (3,1+5) represent the class of nonprincipal 
ideals. They are shown below, in Figure 13.7.4. Rectangles have been put into the figure to 
help you visualize the fact that the two lattices are similar geometric figures. 

We see below (Theorem 13.7.10) that there are always finitely many ideal classes. The 
number of ideal classes in R is called the class number of R. 


Proposition 13.7.3 The ideal classes form an abelian group C, the class group of R, the law 
of composition being defined by multiplication of ideals: (A)(B) = (AB): 


(class of A)(class of B) = (class of AB). 
Proof. Suppose that (A) = (A’) and (B) = (B’), i.e, A’ = AA and B’ = yB for some 


complex numbers A and y. Then A’ B’ = AyAB, and therefore (AB) = (A’ B’). This shows 
that the law of composition is well defined. The law is commutative and associative because 
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multiplication of ideals is commutative and associative, and the class (R) of the unit ideal is 
an identity element that we denote by 1, as usual. The only group axiom that isn’t obvious 
is that every class (A) has an inverse. But this follows from the Main Lemma, which asserts 
that AA is a principal ideal (n). Since the class of a principal ideal is 1, (Aj(A) = 1 and 
(A) = (A)7}. QO 


The class number is thought of as a way to quantify how badly unique factorization 
of eaements fails. More precise information is given by the structure of C as a group. As we 
have seen, the class number of the ring R = Z[V-5] is two. The class group of R has order 
two. One consequence of this is that the product of any two nonprincipal ideals of R is a 
principal ideal. We saw several examples of this in (13.4.7). 


ee OG. caps Ee Bo Be Sag. BS ty te ee eS: a ee 
ee, * * * * 
* 6 * * . . 
x = € 7 * 8 Ok 7 ok + Ok 
he SE BE er ae i ee is 6 ee Be Ge, Tee ie, 
(13.7.4) The Ideals A = (2, 1+ 5) and B = (3, 1+ 8), 6? =: - 


Measuring an Ideal 


The Main Lemma tells us that if A is a nonzero ideal, then AA = (n) is the principal 
ideal generated by a positive integer. That integer is defined to be the norm of A. It will be 
denoted by N(A): 


(13.7.5) N(A) =n, ifn is the positive integer such that AA = (n). 


The norm of an ideal is analogous to the norm of an element. As is true for norms of 
elements, this norm is multiplicative. 


Lemma 13.7.6 If A and B are nonzero ideals, then N(AB) = NV(A)N(B). Moreover, the 
norm of the principal ideal (@) is equal to N(q@), the norm of the: element a. 


Proof. Say that N(A) = m_and N(B) = n. This means that ‘AA = (m) and BB = (n). 
Then (A B)(AB) = (AA)(BB) = (m)(n) = (mn). So N(AB) = mn. 


Next, suppose that A is the principal ideal (a), and let n = N(a) (= @a@). Then 
= (@)(@) = (@@) = (n), so N(A) =n too. Oo 
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We now have four ways to measure the size of an ideal A: 


e the morm N(A), 

e the index [R: A] of A in R, 

e the area A(A) of the parallelogram spanned by a lattice basis for A, 

¢ the minimum value taken on by the norm N(q@), of the nonzero elements of A. 


The relations among these measures are given by Theorem 13.7.8 below. To state that 
theorem, we need a peculiar number: 


2/4 if d=2 or 3 (mod 4) 
(13.7.7) wap 
yz ifd=1 (mod 4). 


Theorem 13.7.8 Let R be the ring of integers in an imaginary quadratic number field, and 
let A be a nonzero ideal of R. Then 

A(A) 

ACR)’ 

(b) If w@ isa nonzero element of A of minimal norm, N(@) < N(A)y. 


(a) N(A) = [R:A] = 


The most important point about (b) is that the coefficient ,2 doesn’t depend on the ideal. 


Proof. (a) We refer to Proposition 13.10.6 for the proof that [R: A] = Re. In outline, the 


proof that N(A) = [.8: A] is as follows. Reference letters have been put over the equality 
symbols. Let n = N(A). Then 


n> + [R:nR] = [R:AA| = [R:A][A:AA] 5 [R: A] [RA] = [RAP 


The equality labeled 1 is Lemma 13.10.3(b), the one labeled 2 is the Main Lemma, which 
says that nR = AA, and 3 is the multiplicative property of the index. The equality 4 follows 
from Proposition 13.6.4: [A : AA] =[RA: AA] = [R: A]. . Finally, the ring R is equal to 
its complex conjugate R, and 5 comes down to the fact that [R: A] = [R: A]. 


(b) When d=2, 3 modulo 4, R has the lattice basis (1, 5), and when d=1 modulo 4, R has 
the lattice basis (1, 7). The area A(R) of the parallelogram spanned by this basis is 


‘ Vld| if d=2 or3 modulo 4 
7. AR) = {V4 3 
epee) (R) |d| ifd=1 modulo 4. 

Sop= 4 A(R). The length of the shortest vector in a lattice is estimated in Lemma 


13.10.8: N(a) < A(A). We substitute A(A) = N(A)A(R) from part (a) into this 
inequality, obtaining N(@) <: N(A)p. D 
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Theorem 13.7.10 


(a) Every ideal class contains an ideal A withnorm N(A) < wp. 


(b) The class group C is generated by the classes of prime ideals P whose norms are prime 
integers p < w. 
(c) The class group C is finite. 


Proof of Theorem 13.7.10. (a) Let A be an ideal. We must find an ideal C in the class (A) 
whose normisat most 4. We choose a nonzero element ain A, with N(@) < N(A)y. Then 
A contains the principal ideal (@), so A divides (q@), i.e., (@) = AC for some ideal C, and 
N(A)N(C) = N(x) < N(A)p. Therefore N(C) < yz. Now since AC is a principal ideal, 
(C) = (A)! = (A). This shows that the class (A) contains an ideal, namely C, whose norm 
is at most yz. Then the class (A) contains C, and N(C) = M(C) < p. 


(b) Every class contains an ideal A of norm N(A) < yp. We factor A into prime ideals: 
A = P,--- Py. Then M(A) = N(P,)--- NCP;), so N(P;) < wu for each i. The classes of 
prime ideals with norm < yz generate C. The norm of a prime ideal P is either a prime 
integer p or the square p* of a prime integer. If N(P) = p”, then P = (p) (13.6.1). This is 
a principal ideal, and its class is trivial. We may ignore those primes. 


(c) We show that there are finitely many ideals A withnorm N(A) < w. If we write such an 
ideal as a product of prime ideals, A = P;--- Pg, and if mj = N(P;), then m,---my < pw. 
There are finitely many sets of integers m;, each a prime or the square of a prime, that satisfy 
this inequality, and there are at most two prime ideals with norms equal to a given integer 
m;. So there are finitely many sets of prime ideals such that N(P; --- Px) < we. O 


13.8 COMPUTING THE CLASS GROUP 


The table below lists a few class groups. In the table, |4z| denotes the floor of yz, the largest 
integer < yu. If n is an integer and if n < yz, thenn < |]. 


d Lue] class group 

=) i C; 
-5 2 C2 
-7 1 Ci 
-14 4 C4 

-21 5 C2 xX C2 
-23 2 C3 
-47 3 Cs 
-71 4 C7 
(13.8.1) Some Class Groups 


To apply Theorem 13.7.10, we examine the prime integers p < ||. If p splits (or 
ramifies) in R, we include the class of one of its two prime ideal factors in our set of 
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generators for the class group. The class of the other prime factor is its inverse. If p remains 
prime, its class is trivial and we discard it. 


Example 13.8.2 d = -163. Since -163=1 modulo 4, the ring R of integers is Z[7], where 
n= 5(1 +6), and |44| = 8. We must inspect the primes p = 2,3, 5, and 7. If p splits, we 
include one of its prime divisors as a generator of the class group. According to Theorem 
13.6.1, an integer prime p remains prime in R if and only if the polynomial x? — x + 41 is 
irreducible modulo p. This polynomial happens to be irreducible modulo each of the primes 
2, 3, 5, and 7. So the class group is trivial, and R is a unique factorization domain. O 


For the rest of this section, we consider cases in which d=2 or 3 modulo 4. In these 
cases, a prime p splits if and only if x? — d has a root in F p- The table below tells us which 
primes need to be examined. 


(13.8.3) Primes Less Than pz, Whe 


If d = -1 or -2, there are no primes less than ;, so the class group is trivial, and R is a unique 
factorization domain. 

Let’s suppose that we have determined which of the primes that need to be examined 
split. Then we will have a set of generators for the class group. But to determine its structure 
we still need to determine the relations among these generators. It is best to analyze the 
prime 2 directly. 


Lemma 13.8.4 Suppose that d=2 or 3 modulo 4. The prime 2 ramifies in R. The prime 
divisor P of the principal ideal (2) is 

e P=(2,1+5), if d=3 modulo 4, 

« P= (2,5), if d=2 modulo 4. 
The class (P) has order two in the class group unless d = -1 or -2. In those cases, P is a 
principal ideal. In all cases, the given generators form a lattice basis of the ideal P. 


Proof, Let P be as in the statement of the lemma. We compute the product ideal PP. If 
d=3 modulo 4, PP = (2,1 — 6)(2,14+. 8) = (4,24 26, 2 — 26, 1 — d), and if d=2 modulo 
4, PP = (2, -8)(2, 8) = (4, 25, -d). In both cases, PP = (2). Theorem 15.10.1 tells us that 
the ideal (2) is either a prime ideal or the product of a prime ideal and its conjugate, so P 
must be a prime ideal. 
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We note also that P = P, so 2 ramifies, (P) = (P)~!, and (P) has order 1 or 2 in the 
class group. It will have order 1 if and only if it is a principal ideal. This happens when d = -1 
or-2.Ifd =-1, P=(1+58), and if d = -2, P = (6). Whend < —2, the integer 2 has no 
proper factor in R, and then P is not a principal ideal. Oo 


Corollary 13.8.5 If d=2 or 3 modulo 4 and d < -2, the class number is even. | 


Example 13.8.6 d@ = -26. Table 13.8 tells us to inspect the primes p = 2,3, and 5. The 
polynomial x? + 26 is reducible modulo 2, 3, and 5, so all of those primes split. Let’s say that 


(2) = PP, (3) = OQ, and (5) = SS. 


We have three generators (P), (Q), (S) for the class group, and (P) has order 2. How 
can we determine the other relations among these generators? The secret method is to 
compute norms of a few elements, hoping to get some information. We don’t have to look 
far: N(1 +6) = 27 = 32 and M(2 + 5) =30=2-3-5. 

Let a = 1 + 6. Then @@ = 3°. Since (3) = OQ, we have the ideal relation 


(@)(a) = (QQ)’. 


Because ideals factor uniquely, the principal ideal (@) is the product of one half of the terms 
on the right, and (@) is the product of the conjugates of those terms. We note that 3 doesn’t 
divide w in R. Therefore QQ = (3) doesn’t divide (@). It follows that (@) is either Q? or 
O°. Which it is depends on which prime factor of (3) we label as Q. 

In cither case, (Q)* = 1, and (Q) has order 1 or 3 in the class group. We check that 3 
has no proper divisor in R. Then since Q divides (3), it cannot be a principal ideal. So (Q) 
has order 3. _ 

Next, let 8 =2+6.Then BB =2 -3 -5, and this gives us the ideal relation 


(B)(B) = PPOQSS. 


Therefore the principal ideal (8) is the product of one half of the ideals on the right and (B) 
is the product of the conjugates of those ideals. We know that P = P. If we don’t care which 
prime factors of (3) and (5) we label as Q and S, we may assume that (8) = PQS. This 
gives us the relation (P)(Q)(S) = 1. 

We have found three relations: 


(P)? =1, (Q)3 =1, and (P)(Q)(S) = 1. 


These relations show that (Q) = (S), (P) = (S)3, and that (S) has order 6. The class group 
is acyclic group of order 6, generated by a prime ideal divisor of 5. 


The next lemma explains why the method of computing norms works. 


Lemma 13.8.7 Let P, Q, S be prime ideals of the ring R of imaginary quadratic inte- 
gers, whose norms are the prime integers p, qg, 5, respectively. Suppose that the relation 
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(Pyi(Q)s (S)* = 1 holds in the class group C. Then there is an element @ in R with norm 
equal to piq/s*. 


Proof. By definition, (P)'(Q)4(S)* = (P'Q/S*). If (P'Q/S*) = 1, the ideal P! O/ S* is 
principal, say P!Q/S* = (a). Then 

(&)(«) = (PP)'(Q.Q)4(SS)* = (p)'(q)1(s)* = (pigis*). 
Therefore N(@) = @a = pigis*. O 


We compute one more class group. 


Example 13.8.8 d = -74. The primes to inspect are 2, 3, 5, and 7. Here 2 ramifies, 3 and 5 
split, and 7 remains prime. Say that (2) = PP, (3) = QQ, and (5) = SS. Then (P), (Q), 
and (S) generate the class group, and (P) has order 2 (13.8.4). We note that 


N(1 +6) 75 =3.-82 
N(4+ 6) 90 =2-32.5 
N(13 + 6) = 243 = 3° 
N(14 + 6) = 270 =2-33-5 


The norm N(13 + 8) shows that (Q)° = 1, so (Q) has order 1 or 5. Since 3 has no 
proper divisor in R, Q isn’t a principal ideal. So (Q) has order 5. Next, N(1 + 5) shows 
that (5)? = (Q) or (Q), and therefore (S) has order 10. We eliminate (Q) from our set of 
generators. Finally, N(4+5) gives us one of the relations (P)(Q)*(S) = 1 or (P)(Q)?(S) = 1. 
Either one allows us to eliminate (P) from our list of generators. The class group is cyclic of 
order 10, generated by a prime ideal divisor of (5). 


13.9 REAL QUADRATIC FIELDS 


We take a brief look at real quadratic number fields, fields of the form Q[/d], where d is a 
square-free positive integer, and we use the field Q[/2] as an example. The ring of integers 
in this field is a unique factorization domain: 


(13.9.1) R =Z[V2] = {a+ bV2|a,b€Z}. 


It can be shown that unique factorization of ideals into prime ideals is true for the ring 
of integers in any real quadratic number field, and that the class number is finite [Cohn], 
[Hasse]. It is conjectured that there are infinitely many values of d for which the ring of 
integers has unique factorization. 


When d is positive, Q[/d] is a subfield of the real numbers. Its ring of integers is not 
embedded as a lattice in the complex plane. However, we can represent R as a lattice in R? 
by associating to the algebraic integer a + bVd the point (u, v) of R?, where 


(13.9.2) u=a+bVvd, v=a-—bVva. 


The resulting lattice is depicted below for the case d = 2. The reason that the hyperbolas 
have been put into the figure will be explained presently. 
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Recall that the field Q[+/d] is isomorphic to the abstractly constructed field 
(13.9.3) F = Q[x]/(x? - d). 


If we replace Q[/d] by F and denote the residue of x in F by 64, then 6 is an abstract square 
root of d rather than the positive real square root, and F is the set of elements a + bd, with 
a and b in Q. The coordinates u, v represent the two ways that the abstractly defined field 
F can be embedded into the real numbers, namely, u sends 6 Jd and v sends 6 ~~» -Jd. 

For a = a + bd € Q[5], we denote by a the “conjugate” element a — b6. The norm 
of a is 


(13.9.4) N(a@) = oa = a — bd. 
If a is an algebraic integer, then M(q@) is an ordinary integer. The norm is multiplicative: 
(13.9.5) N(ap) = N(a)N(B). 


However, N(q) is not necessarily positive. It isn’t equal to |a|?. 


(13.9.6) The Lattice Z[V2]. 


One significant difference between real and imaginary quadratic fields is that the ring 
of integers in a real quadratic field always contains infinitely many units. Since the norm of 
an algebraic integer is an ordinary integer, a unit must have norm +1, and if N(@) = +1, 
then the inverse of a is tw’, so a is a unit. For example, 


(13.9.7) a=14+V2, of =34+2V2, a =74+5V2,... 


are units in the ring R = Z[/2]. The element @ has infinite order in the group of units. 
The condition N(a) = a* — 2b? = +1 for units translates in (u, v)—coordinates to 


(13.9.8) uv =. 
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So the units are the points of the lattice that lie on one of the two hyperbolas wy = 1 and 
uv = -1, the ones depicted in Figure 13.9.6. It is remarkable that the ring of integers in a real 
quadratic field always has infinitely many units or, what amounts to the same thing, that the 
lattice always contains infinitely many points on these hyperbolas. This is far from obvious, 
either algebraically or geometrically, but a few such points are visible in the figure. 


Theorem 13.9.9 Let R be the ring of integers in a real quadratic number field. The group of 
units in R is an infinite group. 


We have arranged the proof as a sequence of lemmas. The first one follows from 
Lemma 13.10.8 in the next section. 


Lemma 13.9.10 For every Ap > 0, there exists an > 0 with the following property: Let L 
be a lattice in the (u, v)-plane P, let A(L) denote the area of the parallelogram spanned 
by a lattice basis, and suppose that A(L) < Ap. Then L contains a nonzero element y with 
Ivi<r. O 


Let Ap and r be as above. For s > 0, we denote by Ds the elliptical disk in the (uz, v) 
plane defined by the inequality s~2u2 + su? < r*. So Dy is the circular disk of radius r. The 
figure below shows three of the disks Dy. 


(13.9.11) Elliptical Disks that Contain Points of the Lattice. 


Lemma 13.9.12 With notation as above, let L be a lattice that contains no point on the 
coordinate axes except the origin, and such that A(L) < Ao. 


(a) For any s > 0, the elliptical disk D, contains a nonzero element of L. 


(b) For any point a = (uw, v) in the disk Ds, |uv| < = 


Proof. (a) The map ¢: R? > R? defined by g(x, y) = (sx, s'y) maps D, to Ds. The 
inverse image L’ = g 'L of L contains no point on the axes except the origin. We note that 
gy is an area-preserving map, because it multiplies one coordinate by s and the other by s!. 
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Therefore A(L’) < Ap. Lemma 13.9.10 shows that the circular disk D, contains a nonzero 
element of L’, say y. Then aw = 9(y) is an element of L in the elliptical disk Dy. 


(b) The inequality is true for the circular disk D,. Let g be the map defined above. If 
a = (u, v) isin Ds, then g '(@) = (s7!u, sv) is in Dy, so |uv| = |(s7!u)(sv)| < a) O 


Lemma 13.9.13 | With the hypotheses of the previous lemma, the lattice L contains infinitely 


: ; P 
many points (uw, v) with |uv| < 5. 


Proof. We apply the previous lemma. For large s, the disk D, is very narrow, anditcontains 
a nonzero element of L, say as. The elements @s; cannot lie on the e,-axis but they must 


become arbitrarily close to that axis as s tends to infinity. It follows that there are infinitely 
re 


many points among them, and if a; = (Us, Us), then |usvs| < 5. O 

Let R be the ring of integers in a real quadratic field, and let n be an integer. We call 
two elements 8; of R congruent modulo n if n divides 8; — 62 in R. When d=2 or 3 modulo 
4 and B; = m; + n;6, this simply means that mj =mz and nj =n2 modulo n. The same is 
true when d=1 modulo 4, except that one has to write 6; = m; +n;n. In all cases, there are 
n? congruence classes modulo n. 


Theorem 13.9.9 follows from the next lemma. 


Lemma 13.9.14 Let R be the ring of integers in a real quadratic number field. 


(a) There is a positive integer n such that the set S of clements of R with norm n is infinite. 
Moreover, there are infinitely many pairs of elements of S that are congruent modulo n. 


(b) If two elements 8; and f2 of R with norm v7 are congruent modulo n, then B2/f, is a 
unit of R. 


Proof. (a) The lattice R contains no point on the axes other than the origin, because u and 
v aren’t zero unless both a and b are zero. If @ is an element of R whose image in the 
plane is the point (u, v), then | N(q@)| = wv. Lemma 13.9.13 shows that R contains infinitely 
many points with norm in a bounded interval. Since there are finitely many integers n in that 
interval, the set of elements of R with norm nv is infinite for at least one of them. The fact 
that there are finitely many congruence classes modulo n proves the second assertion. 


(b) We show that 82/8, is in R. The same argument will show that 6/2 is in R, hence that 
B2/ By, is a unit. Since 6; and B2 are congruent, we can write B2 = 8B; + ny, with yin R. Let 
B’, be the conjugate of 8,.So BB, =n. Then B2/B; = (6; +ny)/B, = 1+ By. This is an 
clement of R, as claimed. O 


13.10 ABOUT LATTICES 


A lattice L in the plane R? is generated, or spanned by a set S if every element of L can 
be written as an integer combination of elements of S. Every lattice L has a lattice basis 
B = (11, v2) consisting of two elements. An element of ZL is an integer combination of the 
lattice basis vectors in exactly one way (see (6.5.5)). 
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Some notation: 


(13.10.1) 


T1(B) : the parallelogram of linear combinations r1v1 + 72v2 with 0 < r; < 1. 
Its vertices are 0, v1, v2, and vy + U2. 
TY’ (B) : the set of linear combinations 7,1, + r2v2 with 0 < r; <1. It is obtained 
by deleting the edges [v1, v1 + v2] and [v2, v1 + v2] from I1(B). 
A(ZL) : the area of IT(B). 
[M:L]: the index of a sublattice L of a lattice M — the number of additive cosets of L in M. 


We will see that A(L) is independent of the lattice basis, so that notation isn’t ambigu- 
ous. The other notation has been introduced before. For reference, we recall Lemma 6.5.8: 


Lemma 13.10.2 Let B = (v), v2) be a basis of R*, and let L be the lattice of integer 
combinations of B. Every vector v in R? can be written uniquely in the form v = w + vo, 
with win ZL and vo in IT’(B). O 


Lemma 13.10.3 Let K C L C M be lattices in the plane, and let B be a lattice basis for L. 
Then 

(a) [M:K] =[M:L][L: K]. 

(b) For any positive integer n, [L:nL] =n’. 
(c) For any positive real number r, [M:L] = [rM:rL]}. 

(d) [M:L) is finite, and is equal to the number of points of M in the region I1’(B). 
(e) The lattice M is generated by L together with the finite set M  I1’(B). 


Proof. (d),(e) We can write an element x of M uniquely in the form v + y, where v is in L 
and y is in Y1'(B). Then v is in M, and so y is in M too. Therefore x is in the coset y + L. 
This shows that the elements of M2 II’(B) are representative elements for the cosets of L 
in M. Since there is only one way to write x = v + y, these cosets are distinct. Since M is 
discrete and I1’(B) is a bounded set, M OM I1’(B) is finite. 


(13.10.4) L={-} 3L = {+}. 
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Formula (a) is the multiplicative property of the index (2.8.14). (b) follows from (a), 
because the lattice nL is obtained by stretching L by the factor n, as is illustrated above for 
the case that n = 3. (c) is true because multiplication by r stretches both lattices by the same 
amount. QO 


Corollary 13.10.5 Let L C M be lattices in R*. There are finitely many lattices between L 
and M. 


Proof. Let B be a lattice basis for L, and let N be a lattice with L C NC M. Lemma 
13.10.3(e) shows that N is generated by L and by the set NM I1’(B), which is a subset of the 
finite set M1 I1’(B). A finite set has finitely many subsets. O 


A(L) 


Proposition 13.10.6 If 1 C M are lattices in the plane, [M@:L] = INOS 


Proof. Say that C is the lattice basis (u;, u2) of M. Let n be a large positive integer, and let 
M,, denote the lattice with basis C, = (4m, tu). Let I’ denote the small region I1’(C,). 
Its area is + A(M). The translates x + I” of IY with x in M, cover the plane without 
overlap, and there is exactly one element of M, in each translate x + I’, namely x. (This is 
Lemma 13.10.2.) 

Let B be a lattice basis for L. We approximate the area of [I(B) in the way 
that one approximates a double integral, using translates of I’. Let r = [M: L]. Then 
[M,:L] =[Myn:M][M:L] = n?r. Lemma 13.10.3(d) tells us that the region I’/(B) contains 
n’r points of the lattice M,. Since the translates of I’ cover the plane, the translates by 
these n*r points cover I1(B) approximately. 


A(L) = n?rA(Mn) =rA(M) =[M:L]A(M). 


The error in this approximation comes from the fact that IT’(B) is not covered precisely 
along its boundary. One can bound this error in terms of the length of the boundary of IT(B) 
and the diameter of I’ (its largest linear dimension). The diameter tends to zero as n — 00, 
and so does the error. O 


Corollary 13.10.7_ The area A(L) of the parallelogram II(B) is independent of the lattice 
basis B. 


This follows when one sets M = L in the previous proposition. 0 
Lemma 13.10.8 Let v be a nonzero element of minimal length of a lattice L. Then 
26 % 
lv? < BAL). 
The inequality becomes an equality for an equilateral triangular lattice. 


Proof. We choose an element v, of L of minimal length. Then v1, generates the subgroup 
Ln é, where @ is the line spanned by vj, and there is an element v2 such that (v1, v2) is a 
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lattice basis of L (see the proof of (6.5.5)). A change of scale changes |v;|* and A(L) by the 
same factor, so we may assume that |vy| = 1. We position coordinates so that v; = (1. 0). 


Say that v2 = (b1, bz)’. We may assume that by is positive. Then A(L) = b7. We may 
also adjust v2 by adding a multiple of v1, to make -4 <b < i, so that bt < i. Since v1 
has minimal length among nonzero elements of L, |vo|* = bi + b2 > |v, |* = 1. Therefore 


b2 > 3. Thus A(L) = bp > 8, and |yy/? = 1 < 5A(L). Oo 


Nullum vero dubium nobis esse videtur, 
quin multa eaque egregia in hoc genere adhuc lateant 
in quibus alii vires suas exercere possint. 


—Carl Friedrich Gauss 


EXERCISES 


Section 1 Algebraic Integers 
1.1. Is ae + /5) an algebraic integer? 


1.2. Prove that the integers in Q[Vd] form a ring. 
1.3. (a) Let @ be a complex number that is the root of a monic integer polynomial, not 
necessarily an irreducible polynomial. Prove that @ is an algebraic integer. 
(b) Let @ be an algebraic number that is the root of an integer polynomial f(x) = 
AnX" +an_1x"-!+...-+4+ ap. Prove that a,a is an algebraic integer. 
(c) Let @ be an algebraic integer that is the root of a monic integer polynomial 
x" +a,_jn""! 4+.-.+a,x + ag. Prove that a7! is an algebraic integer if and only if 
ag = +1. 


1.4, Let d and d’ be integers. When are the fields Q(/d) and Q(./d’) distinct? 


Section 2 Factoring Algebraic Integers 


2.1. Prove that 2, 3, and 14+/-5 are irreducible elements of the ring R = Z[/-5] and that the 
units of this ring are +1. 


2.2. For which negative integers d=2 modulo 4 is the ring of integers in Q[/d] a unique 
factorization domain? 


Section 3 Ideals in Z[/—5] 


3.1. Let a be an element of R = Z[d], 6 = V-5, and let y = Ka + ad). Under what 
circumstances is the lattice with basis (@, y) an ideal? 


3.2. Let 5 = V-5. Decide whether or not the lattice of integer combinations of the given 
vectors is an ideal: (a) (5,1+64), (b) (7,1+54), (c) (4— 26,2+ 25, 64+ 46). 
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3.3. Let A be an ideal of the ring of integers R in an imaginary quadratic field. Prove that 
there is a lattice basis for A, one of whose elements is an ordinary positive integer. 


3.4. For each ring R listed below, use the method of Proposition 13.3.3 to describe the ideals 
in R. Make a drawing showing the possible shapes of the lattices in each case. 


(a) R = Z[V-3], (b) R = Z[$(1 + V-3)], © R =Z[v-6], 
(d) R=Z[}01+ V-7)], (e&) R = Z[V-10] 


Section 4 Ideal Multiplication 


4.1. Let R = Z[/-6]. Find a lattice basis for the product ideal AB, where A = (2, 5) and 
B= (3,4). 

4.2. Let R be the ring Z[5], where 5 = J/-5, and let A denote the ideal generated by the 
elements (a) 3+56, 2426, (b) 44+65,14+26. Decide whether or not the given generators 
form a lattice basis for A, and identify the ideal AA. 

4.3. Let R be the ring Z[5], where 5 = V-S, and let A and B be ideals of the form 
A=(a, (a +ad)), B= (B, s(B+ B5)). Prove that A B is a principal ideal by finding a 
generator. 


Section 5 Factoring Ideals 
5.1. Let R = Z[V-5]. 
(a) Decide whether or not 11 is an irreducible element of R and whether or not (11) isa 


prime ideal of R. 
(b) Factor the principal ideal (14) into prime ideals in Z[8]. 


5.2. Let 5 = /-3 and R = Z[64]. This is not the ring of integers in the imaginary quadratic 
number field Q[5]. Let A be the ideal (2, 1 + 5). 


(a) Prove that A is a maximal ideal, and identify the quotient rng R/A. 

(b) Prove that AA is not a principal ideal, and that the Main Lemma is not true for this 
ring. 

(c) Prove that A contains the principal ideal (2) but that A does not divide (2). 


5.3. Let f = y* — x3 — x. Is the ring C[x, y]/(/) an integral domain? 


Section 6 Prime Ideals and Prime Integers 
6.1. Let d = -14. For cach of the primes p = 2,3, 5, 7,11, and 13, decide whether or not p 
splits or ramifies in R, and if so, find a lattice basis for a prime ideal factor of (p). 
6.2. Suppose that d is a negative integer, and that d2=1 modulo 4. Analyze whether or not 2 
remains prime in R in terms of congruence modulo 8. 


6.3. Let R be the ring of integers in an imaginary quadratic field. 


(a) Suppose that an integer prime p remains prime in R. Prove that R/(p) isa field with 
p? elements. 

(b) Prove that if p splits but does not ramify, then R/(p) is isomorphic to the product 
ring Fp XF p. : 
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6.4. 


6.5. 


6.6. 


6.7. 


When d is congruent 2 or 3 modulo 4, an integer prime p remains prime in the ring of 
integers of Q[/d] if the polynomial x? — d is irreducible modulo p. 


(a) Prove that this is also true when d=1 modulo 4 and p#2. 
(b) What happens to p = 2 when d=1 modulo 4? 


Assume that d is congruent 2 or 3 modulo 4. 

(a) Prove that a prime integer p ramifies in Rif andonly if p = 2 or p divides d. 

(b) Let p be an integer prime that ramifies, and say that (p) = P?. Find an explicit lattice 
basis for P. In which cases is P a principal ideal? 

Let d be congruent to 2 or 3 modulo 4. An integer prime might be of the form a? — bd, 
with a and b in Z. Howis this related to the prime idea! factorization of (p) in the ring of 
integers R? 

Suppose that d= 2 or 3 modulo 4, and that a prime p# 2 does not remain prime in R. Let 
a be an integer such that a®=d modulo p. Prove that (p, a + 5) is a lattice basis for a 
prime ideal that divides (p). 


Section 7 Ideal Classes 


7.1. 
7.2. 


7.3. 


7.4. 


Let R = Z[V-5], and let B = (3, 1 + 6). Find a generator for the principal ideal B?. 


Prove that two nonzero ideals A and A’ in the ring of integers in an imaginary quadratic 
field are similar if and only if there is a nonzero ideal C such that both AC and A’C are 
principal ideals. 


Let d = -26. With each of the following integers n, decide whether n is the norm of an 
element a of R. If it is, find a n =75, 250, 375, 5°. 


Let R = Z[5], where 5? = -6. 

(a) Prove thatthe lattices P = (2, 5) and Q = (3, 4) are prime ideals of R. 
(b) Factor the principal ideal (6) into prime ideals explicitly in R. 

(c) Determine the class group of R. 


Section 8 Computing the Class Group 


8.1. 


8.2. 


8.3. 


8.4, 
8.5. 


8.6. 


With reference to Example 13.8.6, since (P) = (S)3 and (Q) = (S)2, Lemma 13.8.7 
predicts that there are elements whose norms are 2 - 5° and 32 - 5*. Find such elements. 


With reference to Example 13.8.8, explain why N(4 + 5) and N(14 + 4) don’t lead to 
contradictory conclusions. 


Let R = Z[64], with 5 = /-29. In each case, compute the norm, explain what conclusions 
one can draw about ideals in R from the norm computation, and determine the class 
group of R: N(1 + 86), N(4+ 4), N(S +46), N(9+ 26), N(11 + 26). 
Prove that the values of d listed in Theorem 13.2.5 have unique factorization. 
Determine the class group and draw the possible shapes of the lattices in each case: 
(a) d =-10, (b) d =-13, (c) d =-14, (d) d =-21. 
Determine the class group in each case: 
(a) d = -41, (b) d = -57, (c) d =-61, (d) d=-77, (e) d =-89. 
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Section 9 Real Quadratic Fields 


91. 
9.2. 
9.3. 


9.4. 


9.5. 


Prove that 1 + J/2 is an element of infinite order in the group of units of Z[/2]. 

Determine the solutions of the equation x2 — y*d = 1 when d is a positive integer. 

(a) Prove that the size function o(a@) = |N(q)| makes the ring Z[v2] into a Euclidean 
domain, and that this ring has unique factorization. 

(b) Make a sketch showing the principal ideal (/2) of R = Z[V2], in the embedding 
depicted in Figure 13.9.6. 

Let R be the ring of integers in areal quadratic number field. What structures are possible 

for the group of units in R? 


Let R be the ring of integers in a real quadratic number field, and let Vp denote the set 
of units of R that are in the first quadrant in the embedding (13.9.2). 


(a) Prove that Up is an infinite cyclic subgroup of the group of units. 
(b) Find a generator for Up when d = 3 and when d = S. 


(c) Draw a figure showing the hyperbolas and the units in a reasonable size range for 
d =3. 


Section 10 About Lattices 


10.1. 


10.2. 


Let M be the integer lattice in R?, and let L be the lattice with basis ((2, 3)’, (3, 6)*). 
Determine the index [M: L]. 


Let L C M be lattices with bases B and C, respectively, and let A be the integer matrix 
such that BA = C. Prove that [M: L] = |det A]. 


Miscellaneous Problems 


M.1. 
*ML2. 


Describe the subrings S of C that are lattices in the complex plane. 

Let R = Z[6], where 5 = J-S, and let p be a prime integer. 

(a) Prove that if p splits in R, say (p) = PP, then exactly one of the ellipses x* + 5y? = p 
or x? + 5y* = 2p contains an integer point. 

(b) Find a property that determines which ellipse has an integer point. 


. Describe the prime ideals in (a) the polynomial ring C[x, y] in two variables, 


(b) the ring Z[x] of integer polynomials. 


. Let L denote the integer lattice Z* in the plane R’, and let P be a polygon in the plane 


whose vertices are points of L. Pick’s Theorem asserts that the area A(P) is equal to 
a+b/2-—1, where a is the number of points of L in the interior of P, and b is the number 
of points of L on the boundary of P. 


(a) Prove Pick’s Theorem. 
(b) Derive Proposition 13.10.6 from Pick’s Theorem. 


CHAPTER 14 


Linear Algebra in a Ring 


Be wise! Generalize! 


—Picayune Sentinel 


Solving linear equations is a basic problem of linear algebra. We consider systems AX = B 
when the entries of A and B are in a ring R here, and we ask for solutions X = (x), ..., Xn)‘ 
with x; in R. This becomes difficult when the ring R is complicated, but we will see how it 
can be solved when R is the ring of integers or a polynomial ring over a field. 


14.1. MODULES 
The analog for a ring R of a vector space over a field is called a module. 


e Let R be aring. An R-module V is an abelian group with a law of composition written +, 
anda scalar multiplication R X V > V, writtenr, v ~» rv, that satisfy these axioms: 


(14.1.1) lv=v, (rs)v=r(sv), (r+s)u=rv4+sv, and r(v+u')=rvdrv, 


for all r and s in R and all v and v’ in V. 


These are precisely the axioms for a vector space (3.1.2). However, the fact that elements of 
a ring needn’t be invertible makcs modules more complicated. 

Our first examples are the modules R” of R-vectors, column vectors with entries in the 
ring. They are called free modules. The laws of composition for R-vectors are the same as 
for vectors with entries in a field: 

ay by ay +b, ay ray 
+]: [= : and r] : |= : 
an bn an + bp an ran 
But when R isn’t a field, it is no longer true that they are the only modules. There will be 
modules that aren’t isomorphic to any free module, though they are spanned by a finite set. 


An abelian group V, its law of composition written additively, can be made into 
a module over the integers in exactly one way. The distributive law forces us to set 
2v = (14+ 1)v = v4 »v, and so on: 


nv=vut+---+v=“n times v” 
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and (-n)v = -(nv), for any positive integer n. It is intuitively plausible this makes V into a 
Z-module, and also that it is the only way to do so. Let’s not bother with a formal proof. 

Conversely, any Z-module has the structure of an abelian group, given by keeping only 
the addition law and forgetting about its scalar multiplication. 


(14.1.2) Abelian group and Z — module are equivalent concepts. 


We must use additive notation in the abelian group in order to make this correspondence 
seem natural, and we do so throughout the chapter. 

Abelian groups provide examples to show that modules over a ring needn’t be free. 
Since Z” is infinite when 7 is positive, no finite abelian group except the zero group is 
isomorphic to a free module. 


A submodule W of an R-module V is a nonempty subset that is closed under addition 
and scalar multiplication. The laws of composition on V make asubmodule W into a module. 
We've seen submodules in one case before, namely submodules of the ring R, when it is 
thought of as the free R-module R!. 


Proposition 14.1.3 The submodulcs of the R-module R are the ideals of R. 


By definition, an ideal is a nonempty subset of R that is closed under addition and under 
multiplication by elements of R. O 


The definition of a homomorphism y:V — W of R-modules copies that of a linear 
transformation of vector spaces. It is a map compatible with the laws of composition: 
(14.1.4) p(v+v) =9(v) +9’) and (rv) =rg(v), 


for all v and v’ in V andr in R. An isomorphism is a bijective homomorphism. The kernel of 
a homomorphism gy: V > W,, the set of elements v in V such that y(v) = 0, is a submodule 
of the domain V, and the image of gy is a submodule of the range W. 

One can extend the quotient construction to modules. Let W be a submodule of an 
R-module V. The quotient module V = V/W is the group of additive cosets ¥ = [v + W]. 
It is made into an R-module by the rule 


(14.1.5) rb =F. 


The main facts about quotient modules are collected together below. 


Theorem 14.1.6 Let W be a submodule of an R-module V. 

(a) The sct V of additive cosets of W in V is an R-module, and the canonical map 2: V > Vv 
sending v ~» 0 = [v + W] is a surjective homomorphism of R-modules whose kernel is 
W. 

(b) Mapping property: Let f: V > V’ be a homomorphism of R-modules whose kernel K™ 
contains W. There is a unique homomorphism: f:V — V'suchthat f = fom. 
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(c) First Isomorphism Theorem: Let f :V — V’ be a surjective homomorphism of 
R-modules whose kernel is equal to W. The map f defined in (b) is an isomorphism. 

(d) Correspondence Theorem: Let f: V > V be a surjective homomorphism of R-modules, 
with kernel W. There is a bijective correspondence between submodules of VY and 
submodules of V that contain W. This correspondence is defined as follows: If S is 
a submodule of V, the corresponding submodule of V is S = f—'(S) and if S is a 
submodule of V that contains W, the corresponding submodule of W is S = f(S). If S 
and S are corresponding modules, then V/S is isomorphic to V/S. 


We have seen the analogous facts for rings and ideals, and for groups and normal subgroups. 
The proofs follow the pattern set previously, so we omit them. O 


14.2 FREE MODULES 


Free modules form an important class, and we discuss them here. Beginning in Section 14.5, 
we look at other modules. 


e Let R be aring. An R-matrix is a matrix whose entries are in R. An invertible R-matrix 
is an R-matrix that has an inverse that is also an R-matrix. The n Xn invertible R-matrices 
form a group called the general linear group over R: 


(14.2.1) GL,(R) = {n Xn invertible R-matrices}. 


The determinant of an R-matrix A = (a;;) can be computed by any one of the rules 
described in Chapter 1. The complete expansion (1.6.4), for example, exhibits detA as a 
polynomial in the n2 matrix entries, with coefficients +1. 


(14.2.2) detA = +41, p1-+-n,pn- 
P 
As before, the sum is over all permutations p of the indices {1,...,m}, and the symbol + 


stands for the sign of the permutation. When we evaluate this formula on an R-matrix, we 
obtain an element of R. Rules for the determinant, such as 


(det A)(det B) = det (AB), 


continue to hold. We have proved this rule when the matrix entries are in a field (1.4.10), 
and we discuss the reason that such properties are true for R-matrices in the next section. 
Let’s assume for now that they are true. 


Lemma 14.2.3 Let R be a ring, not the zero ring. 

(a) A square R-matrix A is invertible if and only if it has either a left inverse or a right 
inverse, and also if and only if its determinant is a unit of the ring. - 

(b) An invertible R-matrix is square. 


Proof. (a) If A has a left inverse L, the equation (det L)(det A) = det J = 1 shows that det A 
has an inverse in R, so it is a unit. Similar reasoning shows that det A is a unit if A has a right 
inverse. 
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If A is an R-matrix whose determinant 6 is a unit, Cramer’s Rule: A! = 8"! cof(A), 
where cof(A) is the cofactor matrix (1.6.7), shows that there is an inverse with coeffi- 
cients in R. 


(b) Suppose that an m Xn R-matrix P is invertible, ie., that there is ann X m R-matrix Q 
such that PQ = I and also OP = /,. Interchanging P and Q if necessary, we may suppose 
that m > n. If msn, we make P and Q square by adding zeros: 


Q 
P30.) |= hes 


This does not change the product PQ, but the determinants of these square matrices are 
zero, so they are not invertible. Therefore m =n. O 


When R has few units, the fact that the determinant of an invertible matrix must be 
a unit is a strong restriction. For instance, if R is the ring of integers, the determinant must 
be +1. Most integer matrices are invertible when thought of as real matrices, so they are 
in GL,(R). But unless the determinant is +1, the entries of the inverse matrix won’t be 
integers: they won’t be elements of GL,(Z). Nevertheless, when n > 1, there are many 
invertible n Xn R-matrices. The elementary matrices E = I + ae;;, with i#j and a in R, 
are invertible, and they generate a large group. 


We return to the discussion of modules. The concepts of basis and independence 
(Section 3.4) are carried over from vector spaces. An ordered set (v1,..., Ux) of ele- 
ments of a module V is said to generate V, or to span V if every element v is a linear 
combination: 


(14.2.4) VS=PVy +--+ + KVR, 


with coefficients in R. If this is true, the elements v; are called generators. A module V is 
finitely generated if there exists a finite set of generators. Most of the modules we study will 
be finitely generated. 

A set of elements (v1,..., Un) of a module V is independent if, whenever a linear 
combination r,v, + ---+/,U, with 7; in R is zero, all of the coefficients r; are zero. A set 
(v1,..., Un) that generates V and is independent is a basis. As with vector spaces, the set 
(v1, ..., U,) is a basis if every v in V is a linear combination (14.2.4) in a unique way. The 
standard basis E = (€1,..., &x) is a basis of R”. 

We may also speak of linear combinations and independence of infinite sets, using the 
terminology of Section 3.7. Even when S is infinite, a linear combination can involve only 
finitely many terms. 

If we denote an ordered set (v1, ..., Un) of elements of V by B, as in Chapter 3. Then 
multiplication by B, 


x1 
BX = (v1,.-.,Un) | 2 | = vpx, +-+-4+ UnXn, 
Xn 
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defines a homomorphism of modules that we may also denote by B: 


(14.2.5) rR" 8, v. 
As before, the scalars have migrated to the right side. This homomorphism is surjective if 
and only if B generates V, injective if and only if B is independent, and bijective if and only 
if B is a basis. Thus a module V has a basis if and only if it is isomorphic to one of the free 
modules R*, and if so, it is called a free module too. A module is free if and only if it has a 
basis. 

Most modules have no basis. 


A free Z-module is also called a free abelian group. Lattices in R? are free abelian groups, 
while finite, nonzero abelian groups are not free. 

Computation with bases of free modules is done in the same way as with bases of vector 
spaces. If B is a basis of a free module V, the coordinate vector of an element v, with respect 
to B, is the unique column vector X such that v = BX. If two bases B = (11, ...U,) and 
B = (v1, ..., U,,) for the same free module V are given, the basechange matrix is obtained 
as in Chapter 3, by writing the elements of the new basis as linear combinations of the old 
basis: B’ = BP. 


Proposition 14.2.6 Let R be a ring that is not the zero ring. 


(a) The matrix P of a change of basis in a free module is an invertible R-matrix. 
(b) Any two bases of the same free module over R have the same cardinality. 


The proof of (a) is the same as the proof of Proposition 3.5.9, and (b) follows from (a) 
and from Lemma 14.2.3. oO 


The number of elements of a basis for a free module V is called the rank of V. The 
rank is analogous to the dimension of a vector space. (Many concepts have different names 
when used for modules over rings.) 

As is true for vector spaces, every homomorphism f between free modules R” and 
R™ is given by left multiplication by an R-matrix A: 


(14.2.7) R" 4, R™, 

The jth column of A is f(e;). Similarly, if ¢@: V -» W is a homomorphism of free 
R-modules with bases B = (v1,..., Un) andC = (w1,..., Wm), respectively, the matrix of 
the homomorphism with respect to B is defined to be A = (a;;), where 

(14.2.8) g(vj) = > wiai;. 


If X is the coordinate vector of a vector v, ie., if v = BX then Y = AX is the coordinate 
vector of its image, i.e., p(v) = CY. 


(14.2.9) Rn—4> Rm X~~~>Y 


a oe 


Ww v ~~~> g(v) 
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As is true for linear transformations, a change of the bases B and C by invertible R-matrices 
P and Q changes the matrix of y to A’ = QO"! AP. 


14.3 IDENTITIES 


In this section we address the following question: Why do certain properties of matrices with 
entries in a field continue to bold when the entries are in a ring? Briefly, they continue to hold 
if they are identities, which means that they are true when the matrix entries are variables. 
To be specific, suppose that we want to prove a formula such as the multiplicative property 
of the determinant, (det A)(det B) = det (AB), or Cramer’s Rule. Suppose we have already 
proved the formula for matrices with complex entries. We don’t want to do the work again, 
and besides, we may have used special properties of C, such as the field axioms, to check 
the formula there. We did use the properties of a field to prove the ones mentioned, so the 
proofs we gave will not work for rings. We show here how to deduce such formulas for all 
rings, once they have been shown for the complex numbers. 

The principle is quite general, but in order to focus attention, we consider the 
multiplicative property (det A)(det B) = det (AB), using the complete expansion (14.2.2) of 
the determinant as its definition. We replace the matrix entries by variables. Denoting by 
X and Y indeterminate n Xn matrices, the variable identity is (det X)(det Y) = det (XY). 
Let’s write 


(14.3.1) F(X, Y) = (det X) (det Y) — det (YY). 


This is a polynomial in the 2n? variable matrix entries x;; and yge, an element of the ring 
Z[{xi;}, {yxe}] of integer polynomials in those variables. 

Given matrices A = (a;;) and B = (bx) with entries in a ring R, there is a unique 
homomorphism 


(14.3.2) p:Z[txi i}, (yeet] > R, 


the substitution homomorphism, that sends x; ; ~» a; ; and yxe ~» bye. 
Referring back to the definition of the determinant, we see that because @ is a 
homomorphism, it will send 


f(X, Y) ~ f(A, B) = (det A) (det B) — det (AB). 


To prove the multiplicative property for matrices in an arbitrary ring, it suffices to prove that 
f is the zero element in the polynomial ring Z[{x;;}, (yze}]. That is what it means to say that 
the formula is an identity. If so, then since g(0) = 0, it will follow that f(A, B) = 0 for any 
matrices A and B in any ring. 

Now: If we were to expand f and collect terms, to write it as a linear combination of 
monomials, all coefficients would be zero. However, we don’t know how to do this, nor do 
we want to. To illustrate this point, we look at the 2 x 2 case. In that case, 


F(X, Y) = (C11 X22 — X12%21) (11 ¥22 — 12 Y21)) 
— (411 ¥11+-% 12 Y21) (21 y12 + 2222) 
+ (er yi12 + X1222) (X21 Mur + X22 922). 
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This is the zero polynomial, but it isn’t obvious that it is zero, and we wouldn’t want to make 
the computation for larger matrices. 

Instead, we reason as follows: Our polynomial determines a function on the space of 
2n* complex variables {x; j» Yee} by evaluation: If A and B are complex matrices and if we 
evaluate f at {a;;, bye}, we obtain f(A, B) = (det A)(det B) — det(AB). We know that 
F(A, B) is equal to zero because our identity is true for complex matrices. So the function 
that f determines is identically zero. The only (formal) polynomial that defines the zero 
function is the zero polynomial. Therefore f is equal to zero. 


It is possible to formalize this discussion and to prove a general theorem about the 
validity of identities in an arbitrary ring. However, even mathematicians occasionally feel 
that formulating a general theorem isn’t worthwhile — that it is easier to consider each case 
as it comes along. This is one of those occasions. 


14.4 DIAGONALIZING INTEGER MATRICES 


We consider the problem mentioned at the beginning of the chapter: Given an m Xn integer 
matrix A (a matrix whose entries are integers) and a integer column vector B, find the integer 
solutions of the system of linear equations 


(14.4.1) AX =B. 


Left multiplication by the integer matrix A defines a map Z” +, 2. Its kernel is the 
set of integer solutions of the homogeneous equation AX = 0, and its image is the set of 
integer vectors B such that the equation AX = B has a solution in integers. As usual, all 
solutions of the inhomogeneous equation AX = B can be obtained from a particular one by 
adding solutions of the homogeneous equation. 

When the coefficients are in a field, row reduction is often used to solve linear equations. 
These operations are more restricted here: We should use them only when they are given 
by invertible integer matrices — integer matrices that have integer matrices as their inverses. 
The invertible integer matrices form the integer general linear group GLy(Z) . 

The best results will be obtained when we use both row and column operations to 
simplify a matrix. So we allow these operations: 


(14.4.2) 
e add an integer multiple of one row to another, or add an integer multiple of one 
column to another; 
¢ interchange two rows or two columns; 
¢ multiply a row or column by -1. 


Any such operation can be made by multiplying A on the left or right by an elementary 
integer matrix - an elementary matrix that is an invertible integer matrix. The result of a 
sequence of operations will have the form 


(14.4.3) A'=Q1AP, 


where Q and P are invertible integer matrices of the appropriate sizes. 
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Over a field, any matrix can be brought into the block form 


al a 


by row and column operations (4.2.10). We can’t hope for such a result when working with 
integers: We can’t do it for 1 X 1 matrices. But we can diagonalize. 
An example: 


w= 1 2 3 __tow 1 2 3 col 1 0 0 
—14 6 6| oper |O -2 -6| oper] QO -2 -6 


_t|1 0° Of} row 10 0} cl |1 0 0} _,, 
~1Q -2 -6] oper |0 2 6] oper 10 2 Of 
The matrix obtained has the form A’ = Q7!AP, where Q and P are invertible integer 
matrices: 


(14.4.4) 


1-2 3 
(14.4.5) =|, | and P= 1-3 
1 


(It is easy to make a mistake when computing these matrices. To compute Q™!, the 
elementary matrices that produce the row operations multiply in reverse order, while to 
compute P one must multiply in the order that the operations are made.) 


Theorem 14.4.6 Let A be an integer matrix. There exist products Q and P of elementary 
integer matrices of appropriate sizes, so that A’ = Q7!AP is diagonal, say 
dq 
A’ = "4 ” ‘ 


dy 
0 


where the diagonal entries d; are positive, and each one divides the next: d; | dz | ve | dk. 


Note that the diagonal will not lead to the bottom right corner unless A is a square matrix, 
and if k is less than both m and n, the diagonal will have some zeros at the end. 

Wecansum up the information inherent in the four matrices that appear in the theorem 
by the diagram 


(14.4.7) gn 4, om 


|e 
th aera A 


where the maps are labeled by the matrices that are used to define them. 
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Proof. We assume A #0. The strategy is to perform a sequence of operations, so as to end 
up with a matrix 


dy 0 --. 0 
(14.4.8) 

0 
in which d, divides every entry of M. When this is done, we work on M. We describe a 


systematic method, though it may not be the quickest way to proceed. The method is based 
on repeated division with remainder. 


Step 1: By permuting rows and columns, we move a nonzero entry with smallest absolute 
value to the upper left corner. We multiply the first row by -1 if necessary, so that this upper 
left entry a,; becomes positive. 

Next, we try to clear out the first column. Whenever an operation produces a nonzero 
entry in the matrix whose absolute value is smaller than a,,;, we go back to Step 1 and start 
the whole process over. This will spoil the work we have done, but progress is made because 
a; decreases. We won’t need to return to Step 1 infinitely often. 


Step 2: If the first column contains a nonzero entry aj, with i > 1, we divide by aj: 
ai =aigtr, 


where g and r are integers, and the remainder r is in the range 0 < r < aq). We subtract 
q(row 1) from (row /). This changes a@;; to r. If r #0, we go back to Step 1. If r = 0, we have 
produced a zero in the first column. 

Finitely many repetitions of Steps 1 and 2 result in a matrix in which aj; = 0 for all 
i > 1. Similarly, we may use column operations to clear out the first row, eventually ending 
up with a matrix in which the only nonzero entry in the first row and the first column is a4. 


Step 3: Assume that ay, is the only nonzero entry in the first row and column, but that some 
entry b of M is not divisible by ay;. We add the column of A that contains b to column 1. 
This produces an entry b in the first column. We go back to Step 2. Division with remainder 
produces a smaller nonzero matrix entry, sending us back to Step 1. O 


We are now ready to solve the integer linear system AX = B. 


Proposition 14.4.9 Let A be an m Xn matrix, and let Pand Q be invertible integer matrices 
such that A’ = Q7! AP has the diagonal form described in Theorem 14.4.6. 


(a) The integer solutions of the homogencous equation A’X’ = Oare the integer vectors X’ 
whose first k coordinates are zero. 


(b) The integer solutions of the homogencous equation AX = 0 are those of the form 
X = PX', where A’X' = 0. 
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(c) The image W’ of multiplication by A’ consists of the integer combinations of the vectors 
die, inks dye. 
(d) The image W of multiplication by A consists of the vectors Y = QY', where Y’ is in W’. 


Proof. (a) Because A’ is diagonal, the equation A’X’ = 0 reads 
dix; = 0, dx), — 0, eee AX, =0. 


In order for X’ to solve the diagonal system A’ X’ = 0, we must have x =Ofori=1,...,r 
and x’ can be arbitrary if i > k, 


) 


(c) The image of the map A’ is gencrated by the columns of A’, and because A’ is diagonal, 
the columns are especially simple: A‘ = dje; if j < k,and At =Oif j>k. 


(b),(d) We regard Q and P as matrices of changes of basis in Z” and Z”, respectively. The 
vertical arrows in the diagram 14.4.7 are bijective, so P carries the kernel of A’ bijectively to 
the kernel of A, and Q carrics the image of A’ bijectively to the image of A. O 


We go back to example (14.4.4). Looking at the matrix A’ we see that the solutions 
of A'X’ = 0 are the integer multiples of e3. So the solutions of AX = 0 arc the integer 
multiples of Pe3, which is the third column (3, -3, 1)' of P. The image of A’ consists of integer 
combinations of the vectors e; and 2e2, and the image of A is obtained by multiplying these 
vectors by Q. It happens in this cxample that Q = Q™'. So the image consists of the integer 
combinations of the columns of the matrix 


o=( A][5 2]-[5 3]: 


Of course, the image of A is also the set of integer combinations of the columns of A, but 
those columns do not forma Z-basis. 

The solution we have found isn’t unique. A different sequence of row and column 
operations could produce different bases for the kernel and image. But in our example, the 
kernel is spanned by one vector, so that vector is unique up to sign. 


Submodules of Free Modules 


The theorem on diagonalization of integer matriccs can be used to describe homomorphisms 
between free abelian groups. 


Corollary 14.4.10 Let g@: V — W be a homomorphism of free abelian groups. There 
exist bases of V and W such that the matrix of the homomorphism has the diagonal 
form (14.4.6). | 


Theorem 14.4.11 Let W be a free abelian group of rank m, and let U be a subgroup of W. 
Then U isa free abelian group, and its rank is less than or equal to m. 


Proof. We begin by choosing a basis C = (w1,..., Wm) for W and a set of generators 
B = (u44,...,Un) for U. We write uj = YY: w a; ;, and we let A = (a; ;). The columns of 
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the matrix A are the coordinate vectors of the generators u ;, when computed with respect to 
the basis C of W. We obtain a commutative diagram of homomorphisms of abelian groups 


(14.4.12) gn As gm 
n| |e 
U-—+> W 


where i denotes the inclusion of U into W. Because C is a basis, the right vertical arrow is 
bijective, and because B generates U, the left vertical arrow is surjective. 


We diagonalize A. With the usual notation A’ = Q7! AP, we interpret P as the matrix 
of a change of basis for Z”, and Q as the matrix of a change of basis in Z”. Let the new bases 
be C’ and B’. Since our original choices of basis C and the generating set B were arbitrary, 
we may replace C, B and A by C’, B’ and A’ in the above diagram. So we may assume that 
the matrix A has the diagonal form given in (14.4.6). Then u ; = djw; for j=1,...,k. 


Roughly speaking, this is the proof, but there are still a few points to consider. First, 
the diagonal matrix A may contain columns of zeros. A column of zeros corresponds to a 
generator u ; whose coordinate vector with respect to the basis C of W is the zero vector. So 
u ; is Zero too. This vector is useless as a generator, so we throw it out. When we have done 
this, all diagonal entries will be positive, and we will have kK =n andn < m. 


If W is the zero subgroup, we will end up throwing out all the generators. As with 
vector spaces, we must agree that the empty set is a basis for the zero module, or else 
mention this exceptional case in the statement of the theorem. 


We assume that the m Xn matrix A is diagonal, with positive diagonal entries 
d,,...,d, and with n < m, and we show that the set (41, ..., Uy) isa basis of U. Since this 
set generates U, what has to be proved is that it is independent. We write a linear relation 
au, +++: +ay,uy, = 0 in the form a,d,w, +---+ andnWn = 0. Since (W1,..., Wm) is a 
basis, ajd; = 0 for each i, and since d; > 0, a; = 0. 

The final point is more serious: We needed a finite set of generators of U to get started. 
How do we know that there is such a set? It is a fact that every subgroup of a finitely 
generated abelian group is finitely generated. We prove this in Section 14.6. For the moment, 
the theorem is proved only with the additional hypothesis that U is finitely generated. O 


Suppose that a lattice Z in R2 with basis B = (v1, v2) is a sublattice of the lattice M 
with the basis C = (u1, u2), and let A be the integer matrix such that B = CA. If we change 
bases in L and M, the matrix A will be changed to a matrix A’ = QO lap, where P and Q are 
invertible integer matrices. According to Theorem 14.4.6, bases can be chosen so that A is 
diagonal, with positive diagonal entries d; and dz. Suppose that this has been done. Then if 
B = (24, v2) and C = (u1, u2), the equation B = CA reads vy = du, and v2 = dou. 


Example 14.4.13 Leto=|) jae aes 5 |-a'=otar=|t A 


Let M be the integer lattice with its standard basis C = (e, e2), and let L be the lattice 
with basis B = (1), v2) = ((2, 1)‘, (-1, 2)'). Its coordinate vectors are the columns of A. We 
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interpret P as the matrix of a change of basis in L, and Q as the matrix of change of basis 
in M. In coordinate vector form, the new bases are C’ = (e;, e2)Q = ((1, 3)', (0, 1)') and 
B’ = (v4, v2)P = ((1, 3)', (0, 5)'). 

The left-hand figure below shows the squares spanned by the two original bases, 
and the figure on the right shows the parallelograms spanned by the two new bases. 
The parallelogram spanned by the new basis for L is filled precisely by five translates 
of the shaded parallelogram, which is the parallelogram spanned by the new basis for 
M. The index is 5. Note that there are five lattice points in the region I1’(v, v2). This 
agrees with Proposition 13.10.3(d).The figure on the right also makes it clear that the ratio 
A(L)/ACM) is 5. O 


° * . * . * . * 
x + x 8 * * 
. * . ° . . . 
. * . . . * . . * . . * . 
. * * * 
* . x * . 
* . . ° . * ° . * . . . . * . 
(14.4.14) Diagonalization, Applied to a Sublattice. 


14.5 GENERATORS AND RELATIONS 


In this section we turn our attention to modules that are not free. We show how to describe 
a large class of modules by means of matrices called presentation matrices. 


Left multiplication by an m Xn R-matrix A defines a homomorphism of R-modules 


R" 4, R™. Its image consists of all linear combinations of the columns of A with 


coefficients in the ring, and we may denote the image by AR”. We say that the quotient 
module V = R™/A R” is presented by the matrix A. More generally, we call any isomorphism 
o:R™ /AR"— V apresentation of amodule V, and we say that the matrix A is a presentation 
matrix for V if there is such an isomorphism. 

For example, the cyclic group Cs of order 5 is presented as a Z-module by the 1X1 
integer matrix [5], because Cs is isomorphic to Z/SZ. 

-We use the canonical map 7:R™ — V = R™/AR" (14.1.6) to interpret the quotient 
module V = R™/AR’", as follows: 


424 Chapter 14 Linear Algebra in a Ring 


Proposition 14.5.1 

(a) V is generated by a set of elements B = (v1, ..., Um), the images of the standard basis 
elements of R”. 

(b) If Y = (1, .--, Ym)‘ is a column vector in R™, the element BY = v1} + +++ + Um Ym 


of V is zero if and only if Y is a linear combination of the columns of A, with coefficients 
in R — if and only if there exists a column vector X with entries in R such that Y = AX. 


Proof. The images of the standard basis elements generate V because the map zr is 
surjective. Its kernel is the submodule AR”. This submodule consists precisely of the linear 
combinations of the columns of A. Oo 


e If a module V is generated by a set B = (11, ..., Um), we call an element Y of R™ such 
that BY = 0 a relation vector, or simply a relation among the generators. We may also refer 
to the equation vy y, +-:- + Umym = 0 as a relation, meaning that the left side yields 0 
when it is evaluated in V. A set S of relations is a complete set if every relation is a linear 
combination of S with coefficients in the ring. 


Example 14.5.2. The Z-module or an abelian group V that is generated by three elements 
Vy, V2, V3 with the complete set of relations 


3u, + 202 + #23 = O 
8uy, + 4v2 + 203 = O 
(14.5.3) Tv, + 62 + 203 = O 
9, + 62 + #23 = O 
is presented by the matrix 
3 8 79 
(14.5.4) A=|2 4 6 6 
122 1 
Its columns are the coefficients of the relations (14.5.3): 
(v1, v2, v3) A = (0, 0, 0, 0). oO 


We now describe a theoretical method of finding a presentation of an R-module V. 
The method is very simple: We choose a set of generators B= (v4,..., Um) for V. These 
gencrators provide us with a surjective homomorphism R” — V that sends a column vector 
¥Y to the linear combination BY = v, yi +--- + Umm. Let us denote the kernel of this map 
by W. It is the module of relations, its clements are the relation vectors. 

We repeat the procedure, choosing a sct of generators C = (w,..., Wm) for W, and 
we usc these gencrators to define a surjective map R” — W. But here the generators w ; 
are elements of R”. They are column vectors. We asscmble the coordinate vectors A ; of w 
into an m Xn matrix 


| | 
(14.5.5) A= |A, ... Am 
| | 
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Then multiplication by A defines a map 
R” 4; R” 
that sends e; ~~ A; = wy. It is the composition of the map R” — W with the inclusion 
W Cc R™. By construction, W is its image, and we denote it by AR”. 
Since the map R” -> V is surjective, the First Isomorphism Theorem tells us that V is 
isomorphic to R™/W = R””'/A R”. Therefore the module V is presented by the matrix A. 
Thus the presentation matrix A for a module V is determined by 


(14.5.6) 


° aset of generators for V, and 

e a set of generators for the module of relations W. 

Unless the set of generators forms a basis of V, in which case A is empty, the number of 
generators will be equal to the number of rows of A. 

This construction depends on two assumptions: We must assume that our module V 
has a finite set of generators. Fair enough: We can’t expect to describe a module that is too 
big, such as an infinite dimensional vector space, in this way. We must also assume that the 
module W of relations has a finite set of generators. This is a less desireable assumption 
because W is not given: it is an auxiliary module that was obtained in the course of the 
consiruction. We need to examine this point more closely, and we do this in the next section 
(see (14.6.5)). But except for this point, we can now speak of generators and relations for a 
finitely generated R-module V. 

Since the presentation matrix depends on the choices (14.5.6), many matrices present 
the same module, or isomorphic modules. Here are some rules for manipulating a matrix A 
without changing the isomorphism class of the module it presents: 


Proposition 14.5.7 Let A be an m Xn presentation matrix for a module V. The following 
matrices A’ present the same module V: 
(i) A’ = O"!A, with Q in GL», (R); 
(ii) A’ = AP, with Pin GL,(R); 
(iii) A’ is obtained by deleting a column of zeros; 
(iv) the jth column of A is e;. and A’ is obtained from A by deleting (row i) and 
(column j) . 


The operations (iii) and (iv) can also be done in reverse. One can add a column of zeros, or 
one can add anew row and column with 1 as their common entry, all other entries being zero. 


Proof. We refer to the map R” “, R” defined by the matrix. 


(i) The change of A to Q7!A corresponds to a change of basis in R™. 
(ii) The change of A to AP corresponds to a change of basis in R”. 
(iti) A column of zeros corresponds to the trivial relation, which can be omitted. 


(iv) A column of A equal to ¢; corresponds to the relation vj = 0. The zero element is 
useless as a generator, and its appearance in any other relation is irrelevant. So we may 
delete v; from the generating set and from the relations. Doing so changes the matrix 
A by deleting the ith row and jth column. O 
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It may be possible to simplify a matrix quite a lot by these rules. For instance, our 
original example of the integer matrix (14.5.4) reduces as follows: 


3879 0216 
Aaa ae gs 2S 002 4/>(5 3 §l-[2 6 -s|> 
1221 1221 
ey fea 281 1a. 814 


Thus A presents the abelian group Z/4Z. 

By definition, an m Xn matrix presents a module by means of m generators and n 
relations. But as we see from this example, the numbers m and n depend on choices; they 
are not uniquely determined by the module. 


Another example: The 2 x 1 matrix k presents an abelian group V by means of two 


generators (v1, v2) and one relation 4v; = 0. We can’t simplify this matrix. The abelian 
group that it presents is the direct sum Z/4Z ® Z of a cyclic group of order four and an 
infinite cyclic group (see Section 14.7). On the other hand, as we saw above, the matrix 
[4 0] presents a group with one generator v, and two relations, the second of which is the 
trivial relation. It is a cyclic group of order 4. 


14.6 NOETHERIAN RINGS 


In this section we discuss finite generation of the module of relations. For modules over a 
nasty ring, the module of relations needn’t be finitely generated, though V is. Fortunately 
this doesn’t occur with the rings we have been studying, as we show here. 


Proposition 14.6.1 The following conditions on an R-module V are equivalent: 


(i) Every submodule of V is finitely generated; 
(ii) ascending chain condition: There is no infinite strictly increasing chain 
W, < W2 <--- of submodules of V. 


Proof. Assume that V satisfies the ascending chain condition, and let W be a submodule of 
V. We select a set of generators of W in the following way: If W = 0, then W is generated by 
the empty set. If not, we start with a nonzero element w, of W, and we let W, be the span of 
(w,1). If W, = W we stop. If not, we choose an element w2 of W not in W;, and we let W> 
be the span of (w 1, w2). Then W, < W). If W2 < W, we choose an element w3 not in W2, 
etc. In this way we obtain a strictly increasing chain W; < W2 <--- of submodules of W. 
Since V satisfies the ascending chain condition, this chain cannot be continued indefinitely. 
Therefore some W, is equal to W, and then (w1,..., wg) generates W. 


The proof of the converse is similar to the proof of Proposition 12.2.13, which states that 
factoring terminates in a domain if and only if it has no strictly increasing chain of principal 
ideals. Assume that every submodule of V is finitely generated, and let W;C WC... 
be an infinite weakly increasing chain of submodules of V. We show that this chain is not 
strictly increasing. Let U denote the union of these submodules. Then U is a submodule of 
V. The proof is the same as the one given for ideals (12.2.15). So U is finitely generated. Let 
(U1, ..., Ur) be a set of generators for U. Each uy is in one of the modules W; and since the 
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chain is increasing, there is an i such that Wj; contains all of the elements u;,..., u,. Then 
W; contains the module U generated by (u1..... ug): UC W; C Wiz; C U. This shows that 
U = W; = Wj. = U, and that the chain is not strictly increasing. 0 


Definition 14.6.2. A ring R is noetherian if every ideal of R is finitely generated. 


Corollary 14.6.3 A ring is noetherian if and only if it satisfies the ascending chain condition: 
There is no infinite strictly increasing chain 1} < /) <--- of ideals of R. | 


Principal ideal domains are noetherian because every ideal in such a ring is generated 
by one element. So the rings Z, Z[i], and F{[x], with F a field, are noetherian. 


Corollary 14.6.4 Let R be a noetherian ring. Every proper ideal J of R is contained in a 
maximal ideal. 


Proof. If J is not maximal itself, then it is properly contained in a proper ideal /), and if 22 
is not maximal, it is properly contained in a proper ideal 73, and so on. By the ascending 
chain condition (14.6.1), the chain J < J) < /;--- must be finite. Therefore 7; is maximal for 
some k. Oo 


The relevance of the concept of a noetherian ring to the problem of finite generation of a 
submodule is shown by the following theorem: 


Theorem 14.6.5 Let R be a noetherian ring. Every submodule of a finitely generated 
R-module V is finitely generated. 


Proof. Case 1: V = R™. We use induction on m. A submodule of R! is an ideal of R 
(14.1.3). Since R is noetherian, the theorem is true when m = 1. Suppose that m > 1. We 
consider the projection 
m:R™ — R™} 

given by dropping the last entry: 1(ay,....Qm) = (@1,..., @m—1). Its kernel is the set of 
vectors of R™ whose first m — 1 coordinates are zero. Let W be a submodule of R”™, and let 
y:W — R™-1 be the restriction of 2 to W. The image y( W) is a submodule of R”~!. It is 
finitely generated by induction. Also, kerg@ = (W Nkerz) is a submodule of ker, which 
is a module isomorphic to R!. So ker @ is finitely generated. Lemma 14.6.6 shows that W is 
finitely generated. 


Case 2: The general case. Let V be a finitely generated R-module. Then there is a surjective 
map yg: R™ — V from a free module to V. Given a submodule W of V, the Correspondence 
Theorem tells us that U = yg !(W) is a submodule of the module R”, so it is finitely 
generated, and W = g(U). Therefore W is finitely generated (14.6.6)(a). O 


Lemma 14.6.6 Let g:V — V’ be a homomorphism of R-modules. 


(a) If V is finitely generated and ¢@ is surjective, then V’ is finitely generated. 
(b) If the kernel and the image of ¢ are finitely generated, then V is finitely generated. 
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(c) Let W be a submodule of an R-module V. If both W and V = V/W are finitely 
generated, then V is finitely generated. If V is finitely generated, so is V. 


Proof. (a) Suppose that ¢g is surjective and let (v1, ..., U,) be a set of generators for V. 
The set (v},,..., U;,), with vi = (vj), generates V’. 


(b) We follow the proof of the dimension formula for linear transformations (4.1.5). We 
choose a set of generators (u4;,..., 4%) for the kernel and a set of generators (vj, Lege) 
for the image. We also choose elements v; of V such that y(v;) = v;, and we show that the 
set (Uy, ..., Ux; Ui,..., Um) generates V. Let v be any element of V. Then ¢(v) is a linear 
combination of (vj, .-., Uj_), Say P(V) = ayv, +--- + AmVj,. Let x = ayuy +++-+amUm. 
Then y(x) = ¢(v), hence v — x is in the kernel of g. So v — x is a linear combination of 
(uy,..., Uz), Say U—-X = byuy +--+ + bguy, and 


V = QV, +--+ + AmUm + byuy +--+ dy. 


Since v was arbitrary, the set (u,,..., Ux; V1, ..., Um) generates. 


(c) This follows from (b) and (a) when we replace g by the canonical homomorphism 
w:iV > V. O 


This theorem completes the proof of Theorem 14.4.11. 

Since principal ideal domains are noetherian, submodules of finitely generated modules 
over these rings are finitely generated. In fact, most of the rings that we have been studying 
are noetherian. This follows from another of Hilbert’s theorems: 


Theorem 14.6.7 Hilbert Basis Theorem. Let R be a noetherian ring. The polynomial ring 
R[x] is noetherian. 


The proof of this theorem is below. It shows by induction that the polynomial ring 
R[x1,....Xn] in several variables over a noethcrian ring R is noetherian. Therefore the 
rings Z[x;,...,Xn] and F[x,,...,X,], with F a field, are noetherian. Also, quotients of 
noetherian rings are noetherian: 


Proposition 14.6.8 Let R be a noetherian ring, and let J be an ideal of R. Any ring that is 
isomorphic to the quotient ring R = R/J is noetherian. 


Proof, Let J be an ideal of R, and Iet 2: R > R be the canonical map. Let J = 27! (J) be 
the corresponding ideal of R. Since R is noetherian, J is finitely generated, and it follows 
that J is finitely generated (14.6.6)(a). O 


Corollary 14.6.9 Let P be a polynomial ring in a finite number of variables over the integers 
or over a field. Any ring R that is isomorphic to a quotient ring P/J is noetherian. O 


We turn to the proof of the Hilbert Basis Theorem now. 


Lemma 14.6.10 Let R be a ring and let J be an ideal of the polynomial ring R[x]. The set A 
whose elements are the leading coefficients of the nonzero polynomials in J, together with 
the zeroclement of R, is an ideal of R, the ideal of leading coefficients. 


Section 14.7 Structure of Abelian Groups 429 


Proof. We must show that if@ and Bare in A, thena+ andrea arealsoin A. If any one of the 
three elements a, B, or a+ Bis zero, then a+ Bis in A,so we may assume that these elements 
are not zero. Then q@ is the leading coefficient of an element f of /, and B is the leading 
coefficient of an element g of 7. We multiply f or g by a suitable power of x so that their 
degrees become equal. The polynomial we get is also in 7. Then w+ fis the leading coefficient 
of f+g.Since /is anideal, f+g isin /and a+ Bisin A. The proof thatra is in A is similar. 


Proof of the Hilbert Basis Theorem. We suppose that R is a noetherian ring, and we let / 
be an ideal in the polynomial ring R[x]. We must show that there is a finite subset S of J 
that generates this ideal — a subset such that every element of 7 can be expressed as a linear 
combination of its elements, with polynomial coefficients. 


Let A be the ideal of leading coefficients of /. Since R is noetherian, A has a finite set 
of generators, say (@1,.... @,%). We choose foreachi = 1,...,k a polynomial f; in J with 
leading coefficient a@;, and we multiply these polynomials by powers of x as necessary, so 
that their degrees become equal, say ton. 

Next, let P denote the set consisting of the polynomials in R[x] of degree less than 
n, together with 0. This is a free R-module with basis (1, x,...,x”~!'). The subset P/N J, 
which consists of the polynomials of degree less than n that are in / together with zero, is an 
R-submodule of P. Let’s call this submodule W. Since P is a finitely generated R-module 
and since R is noetherian, W is a finitely generated R-module. We choose generators 
(hy, ..., he) for W. Every polynomial in J of degree less than 7 is a linear combination of 
(hi, ..., he), with coefficients in R. 


We show now that the set (f/j,..., fx; 41,..-., 4) generates the ideal 7. We use 
induction on the degree d of g. 


Case 1: d <n. In this case, g is an element of W, so it is a linear combination of (hy, ..., he) 
with coefficients in R. We don’t need polynomial cocfficients here. 


Case 2:d > n. Let B be the leading coefficient of g, so g = Bx? + (lower degree terms). 
Then # is an element of the ideal A of leading coefficients, so it is a linear combination 
B = rja, +--- +r, of the leading coefficients a; of fj, with coefficients in R. The 


polynomial 
a= Dorie" fi 


is in the ideal generated by (/1,..., f,). It has degree d, and its leading coefficient is £. 
Therefore the degree of g — gq is less than d. By induction, g — q is a polynomial combination 
of (fi,.--, fk Ai, -.-, he). Then g = q + (g — q) is also such a combination. O 


14.7. STRUCTURE OF ABELIAN GROUPS 


The Structure Theorem for abelian groups, which is below, asserts that a finite abelian group 
V is a direct sum of cyclic groups. The work of the proof has been done. We know that there 
exists a diagonal presentation matrix for V. What remains to do is to interpret the meaning 
of this matrix for the group. 

The definition of a direct sum of modules is the same as that of a direct sum of vector 
spaces. 
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* Let Wi,..., Wz be submodules of an R-module V. Their sum is the submodule that they 
generate. It consists of all elements that are sums: 


(14.7.1) Wit: +Wr=(veVl|v=wit---+ we, with w; in Wi}. 


Wesay that V is the direct sum of the submodules Wj, ..., Wx, and we write 
V=W,98.:--DW,, if 


(14.7.2) 
e they generate: V = W,+---+ W,, and 
¢ they are independent: If w, +---+ wy = 0, with w; in W;, then w; = 0 for alli. 


Thus V is the direct sum of the submodules W; if every element v in V can be written 
uniquely in the form v = w) +---+ wz, with w; in W;. As is true for vector spaces, a module 
V is the direct sum W, ® W2 of two submodules W, and W) if and only if W, + W2 = V 
and W, 9 W2 = 0 (see (3.6.6)). 

The same definitions are used for abelian groups. An abelian group V is the direct sum 
W,0.--@ W, of the subgroups W;,..., W if: 

e Every element v of V can be written as asum v = w1 +--++ wy, with w; in Wj, ie., 

V=W,+---+ Wy. 
e Ifasum w,4+---+ wyx, with w; in W; is zero, then w; = 0 for all i. 


Theorem 14.7.3 Structure Theorem for Abelian Groups. A finitely generated abelian group 
V is adirect sum of cyclic subgroups Cg,,..., Cg, and a free abelian group L: 


V=Cq, O---®Cg, OL, 


where the order d; of Cg, is greater than 1, and d; divides d;,, fori =1,...,k —1. 


Proof of the Structure Theorem. We choose a presentation matrix A for V, determined by 
a set of generators and a complete set of relations. We can do this because V is finitely 
generated and because Z is a Noetherian ring. After a suitable change of generators and 
relations, A will have the diagonal form given in Theorem 14.4.6. We may eliminate any 
diagonal entry that is equal to 1, and any column of zeros (see (14.5.7)). The matrix A will 
then have the shape 


(14.7.4) 


with d, > 1 and dj |d| --- |dx. It will be an m Xk matrix, 0 < k < m. The meaning of this for 
our abelian group is that V is generated by a set of m elements B = (v1, ..., Um), and that 


(14.7.5) dy Vy SH Oe ees vx =0 


forms a complete set of relations among these generators. 
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Let C; denote the cyclic subgroup generated by v;, for j= 1,...,m. For j < k, Cj 
is cyclic of order d;, and for j > k, C; is infinite cyclic. We show that V is the direct sum 
of these cyclic groups. Since B generates, V = C; + --- + Cy». Suppose given a relation 
wy +--+ + Wm = 0 with w; in C;. Since v; generates C ;, w; = vj; for some integer y;. 
The relation is BY = v1yj + --- + UmYm = 0. Since the columns of A form a complete set 
of relations, Y = AX for some integer vector X, which means that y; is a multiple of d; if 
J <kand y; =Oif j>k. Since vjd; = O0if 7 < k, w; = Oif j < k. The relation is trivial, 
so the cyclic groups C ; are independent. The direct sum of the infinite cyclic groups C ; with 
j > kis the free abelian group L. 


A finite abelian group is finitely generated, so as stated above, the Structure Theorem 
decomposes a finite abelian group into a direct sum of finite cyclic groups, in which the order 
of each summand divides the next. The free summand will be zero. 


It is sometimes convenient to decompose the cyclic groups further, into cyclic groups 
of prime power order. This decomposition is based on Proposition 2.11.3: If a and b are 
relatively prime integers, the cyclic group Cg, of order ab is isomorphic to the direct sum 
Ca ® Cp of cyclicsubgroups of orders a and b. Combining this with the Structure Theorem 
yields the following: 


Corollary 14.7.6 Structure Theorem (Alternate Form). Evcry finite abelian group is a direct 
sum of cyclic groups of prime power orders. O 


It is also true that the orders of the cyclic subgroups that occur are uniquely determined 
by the group. If the order of V is a product of distinct primes, there is no problem. For 
example, if the order is 30, then V must be isomorphic to C2 ® C3 ® Cs and to Cp. 
But is Cz ® C2 ® Cy isomorphic to C4 ® C4? It isn’t difficult to show that it is not, by 
counting elements of orders 1 or 2. The group C4 ® C4 contains four such elements, while 
C2 ® Co ® C4 contains eight of them. This counting method always works. 


Theorem 14.7.7 Uniqueness for the Structure Theorem. Suppose that a finite abelian group 
V is a direct sum of cyclic groups of prime power orders d; = Pp} . The integers d; are 
uniquely determined by the group V. 


Proof. Let p be one of the primes that appear in the direct sum decomposition of V, and let 
cj denote the number of cyclic groups of order p’ in the decomposition. The set of elements 
whose orders divide p’ is a subgroup of V whose order is a power of p, say p®. Let k be the 
largest index such that cy > 0. Then 

Lp =cypt coat cgte-- + Cy 

£2 = cy +2c2 + 203 +--+ + 2¢,, 

3 = c, +2c2 +:3c3 +--+ +3cK 


€y = C1 +2c2 +303 +--+ + keg. 


The exponents £; determine the integers c;. O 
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The integers d; are also uniquely determined when they are chosen, as in Theorem 14.7.3, 
so that d\|--- |dg. 


14.8 APPLICATION TO LINEAR OPERATORS 


The classification of abelian groups has an analogue for the polynomial ring R = F[f] in one 
variable over a field F. Theorem 14.4.6 about diagonalizing integer matrices carries over 
because the key ingredient in the proof of Theorem 14.4.6, the division algorithm, is available 
in F[t]. And since the polynomial ring is noetherian, any finitely generated R-module V has 
a presentation matrix (14.2.7). 


Theorem 14.8.1 Let R = F[t] be a polynomial ring in one variable over a field F and let 
A be an m Xn R-matrix. There are products Q and P of clementary R-matrices such that 
A’ = Q"!AP is diagonal, each nonzero diagonal entry d; of A’ is a monic polynomial, and 
d, | d2|... | de. | 


Example 14.8.2. Diagonalization of a matrix of polynomials: 


et 0 
col -] 1-2 col -1 0 row 1 0 
et 0 Pst 2-37 0 88-3421 |" 


Note: It is not surprising that we ended up with 1 in the upper left corner in this example. 
This will happen whenever the greatest common divisor of the matrix entries is 1. O 


al eye Poo aa eee a col 
= —- — 


As is true for the ring of integers, Theorem 14.8.1 provides us with a method to 
determine the polynomial solutions of a system AX = B, when the entries of A and B are 
polynomial matrices (see Proposition 14.4.9). 


We extend the structure theorem to polynomial rings next. To carry along the analogy 
with abelian groups, we define a cyclic R-module C, where R is any ring, to be a module 
that is generated by a single element v. Then there is a surjective homomorphism g: R > C 
that sends r ~» rv. The kernel of y, the module of relations, is a submodule of R, an ideal J/. 
By the First Isomorphism Theorem, C is isomorphic to the R-module R/ J. 

When R = F{[t], the ideal J will be principal, and C will be isomorphic to R/(d) for 
some polynomial d. The module of relations will be generated by a single element. 


Theorem 14.8.3 Structure Theorem for Modules over Polynomial Rings. Let R = Ft] be 
the ring of polynomials in one variable with coefficients in a field F. 


(a) Let V be a finitely generated module over R. Then V is a direct sum of cyclic modules 
Ci, C2,..., Cx and a free module L, where Cj is isomorphic to R/(d;), the elements 
dy, ..., dx are monic polynomials of positive degree, and dj | d2 | ... | dg. 

(b) The same assertion as (a), except that the condition that d; divides dj+, is replaced by: 
Each dj; is a power of a monic irreducible polynomial. Db 
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It is also true that the prime powers occurring in (b) are unique, but we won’t take the time 
to prove this. 


For example, let R = R[t], and the R-module V- presented by the matrix A of Example 
14.8.2. It is also presented by the diagonal matrix 


eft 0 
. =| 5 ee | 


and we can drop the first row and column from this matrix (14.5.7). So V is presented by 
the 1X1 matrix [g], where g(t) = 2 — 3¢? + 2t = t(t — 1)(t — 2). This means that V is a 
cyclic module, isomorphic to C = R/(g). Since g has three relatively prime factors, V can 
be further decomposed. It is isomorphic to a direct sum of cyclic R-modules: 


(14.8.4) R/(g) © (R/()) @ (R/(t — 1) B (R/(t — 2)). 


We now apply the theory we have developed to study linear operators on vector spaces 
over a field. This application provides a good example of how abstraction can lead to new 
insights. The method developed for abelian groups is extended formally to modules over 
polynomial rings, and is then applied in a concrete new situation. This was not the historical 
development. The theories for abelian groups and for linear operators were developed 
independently and were tied together later. But it is striking that the two cases, abelian 
groups and linear operators, can end up looking so different when the same theory is applied 
to them. 

The key observation that allows us to proceed is that if we are given a linear operator 


(14.8.5) T:V>V 


on a vector space over a field F’, we can use this operator to make V into a module over the 
polynomial ring F[z]. To do so, we must define multiplication of a vector v by a polynomial 
S(O = ant" +---+ art + ao. We set 


(14.8.6) f(t)v = anT"(v) + an-jT" 1 (v) +--+» +. ay Tv) +.agv 


The right side could also be written as [ f(7)](v), where {(7) denotes the linear operator 
AnT”" +an_1T" | +--+» +a,T + aol. (The brackets have been added to make it clear that 
it is the operator f(7) that acts on v.) With this notation, we obtain the formulas 


(14.8.7) tu=T(v) and f()v=[f(D](v). 


The fact that rule (14.8.6) makes V into an F[t]-module is easy to verify, and the formulas 
(14.8.7) may appear tautological. They raise the question of why we need a new symbol ¢. 
But f(£) is a polynomial, while (7) is a linear operator. 

Conversely, if V is an F[t]-module, scalar multiplication of elements of V by a 
polynomial is defined. In particular, we are given a rule for multiplying by the constant 
polynomials, the elements of F. If we keep the rule for multiplying by constants but forget 
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for the moment about multiplication by nonconstant polynomials, then the axioms for a 
module show that V becomes a vector space over F (14.1.1). Next, we can multiply elements 
of V by the polynomial f. Let us denote the operation of multiplication by f on V as T. So T 
is the map 


(14.8.8) VV, defined by T(v) =1W. 


This map is a linear operator when V is considered as a vector space over F. By the 
distributive law, t(v + v’) = tu+ tv’, therefore T(v + v’) = T(v) + T(v’). If c is a scalar, 
then tcv = ctv, and therefore T(cv) = c7T(v). So an F[t]-module V provides us with a 
linear operator on a vector space. The rules we have described, going from linear operators 
to modules and back, are inverse operations. 


Linear operator on an F-vector space and 


(14.8.9) F[t]-module are equivalent concepts. 


We will want to apply this observation to finite-dimensional vector spaces, but we note 
in passing the linear operator that corresponds to the free F'[f]-module of rank 1. When F'[t] 
is considered as a vector space over F, the monomials (1, ¢, f,.. .) form a basis, and we 
can use this basis to identify F'[t] with the infinite-dimensional space Z, the space of infinite 
row vectors (do, a), 42, ...) with finitely many entries different from zero that was defined 
in (3.7.2). Multiplication by tf on F[t] corresponds to the shift operator T: 


(ao, 41, €2,...) ~~ (0, Ap, @,a2,...). 


The shift operator on the space Z corresponds to the free F[t]-module of rank 1. 

We now begin our application to linear operators. Given a linear operator T on a 
vector space V over F, we may also view V as an F[f]-module. We suppose that V is 
finite-dimensional as a vector space, say of dimension n. Then it is finitely generated as a 
module, and it has a presentation matrix. There is some danger of confusion here, because 
there are two matrices around: the presentation matrix for the module V, and the matrix of 
the linear operator 7. The presentation matrix is an r X s matrix with polynomial entries, 
where r is the number of chosen generators for the module and s is the number of relations. 
The matrix of the linear operator is an m Xn matrix whose entries are scalars, where 7 is the 
dimension of V . Both matrices contain the information needed to describe the module and 
the linear operator. 

Regarding V as an F[t]-module, we can apply Theorem 14.8.3 to conclude that V is a 
direct sum of cyclic submodules, say 


V=W,O---OW,, 


where W; is isomorphic to F[t]/(f;), fi being a monic polynomial in F[t]. When V is 
finite-dimensional, the free summand is zero. 

Tointerpret the meaning of the direct sum decomposition for the linear operator 7, we 
choose bases B; for the subspaces W;. Then with respect to the basis B = (Bi, ..., By), the 
matrix of T has a block form (4.4.4), where the blocks are the matrices of T restricted to the 
invariant subspaces W,. Perhaps it will be enough to examine the operator that corresponds 
to a cyclic module. 
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Let W be a cyclic F[t]-module, generated as a module by a single element that we 
label as wo. Since every ideal of F[f] is principal, W will be isomorphic to F[t]/(/), 
where f = ¢? + a,_\f*~! +-..+a,f + ao is a monic polynomial in F[f]. The isomorphism 
F[t]/(f) > W will send 1~» wo. The set (1, f,..., #77!) is a basis of F[t]/(/) (11.5.5), so 
the set (wo, two, t2wo, ...f7!wo) is a basis of W as vector space. 

The corresponding linear operator 7: W — W is multiplication by ¢. Written in terms 
of T, the basis of W is (wo, W1,... Wn_1), with w; = T/ wo. Then 


T(wo) = wi, T(wy)=w2 ,..., T(Wr-2) = Wn-1, and 
[ f(D ]wo = T” wo + an-1T" wo +--+ + ayTwo + anwo = 0. 
= Thn-1 + Qn-1Wn-1 +++ + a,W1 + Aqwo = 0. 


This determines the matrix of 7. It has the form illustrated below for small values of n: 


ee 0 0 -ag 
(14.8.10) [-a0]. |} ae tO ay | any 
-Qy 
0 1 -a, 


The characteristic polynomial of this matrix is f(t). 


Theorem 14.8.11 Let T be a linear operator on a finite-dimensional vector space V over a 
field F. There is a basis for V with respect to which the matrix of T is made up of blocks of 
the type shown above. O 


This form for the matrix of a linear operator is called a rational canonical form. It is the best 
available for an arbitrary field. 


Example 14.8.12 Let F = R. The matrix A shown below is in rational canonical form. Its 
characteristic polynomial is f° — 1. Since this is a product of relatively prime polynomials: 
P-—1=(t—1)(t? +t +1), the cyclic R[t]-module that it presents is a direct sum of cyclic 
modules. The matrix A’ is another rational canonical form that describes the same module. 
Over the complex numbers, A is diagonalizable. Its diagonal form is A”, where w = e27#/3, 


0 1 1 1 
0 0], 4’ = 0 -1],A”’= w 
0 = Z 
1 1 -1 w 7 


Various relations between properties of an F[t]-module and the corresponding linear 
operator are summed up in the table below. 


oro 


(14.8.13) A= 


(14.8.14) F[rt]-module Linear operator T 
multiplication by ¢ operation of T 
free module of rank 1 shift operator 
submodule T-invariant subspace 
direct sum of submodules direct sum of T-invariant subspaces 


cyclic module generated by w subspace spanred by w, T(w), T”(w), ... 
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14.9 POLYNOMIAL RINGS IN SEVERAL VARIABLES 


Modules over a ring become increasingly complicated with increasing complication of the 
ring, and it can be difficult to determine whether or not an explicitly presented module is 
free. In this section we describe, without proof, a theorem that characterizes free modules 
over polynomial rings in several variables. This theorem was proved by Quillen and Suslin 
in 1976. 

Let R = C[x,,..., Xx] be the polynomial ring in k& variables, and let V be a finitely 
generated R-module. Let A be a presentation matrix for V. The entries of A will be 
polynomials a; ;(x), and if A is an m Xn matrix, then V is isomorphic to the cokernel 
R™ /AR®" of multiplication by A on R-vectors. 

When we evaluate the matrix entries a; ;(x) at a point (c,,..., Cx) of CK we obtain a 
complex matrix A(c) whose i, j-entry is a; ;(c). 


Theorem 14.9.1 Let V bea finitely generated module over the polynomial ring C[x;, ..., xx], 
and let A be an m Xn presentation matrix for V. Denote by A(c) the evaluation of A ata 
point c of Ck. Then V is a free module of rank r if and only if the matrix A(c) has rank m —r 
at every point c. 


The proof of this theorem requires too much background to give here. However, we can use 
it to determine whether or not a given module is free. For example, let V be the module 
over C[x, y] presented by the 4 x2 matrix 


1 5 
3 
(14.9.2) Ys eens 
x y 
re 
So V has four generators, say v1, ..., v4, and two relations: 


vy t+ yur $.xv3 4x24 =0 and xvy + (x + 3)v2 + yu3 t+ y*v4 = 0. 


It isn’t very hard to show that A(c) has rank 2 for every point c in C*. Theorem 14.9.1 tells 
us that V is a free module of rank 2. 

One can get an intuitive understanding for this theorem by considering the vector 
space W(c) spanned by the columns of the matrix A(c). It is a subspace of C™. As c varies 
in the space C*, the matrix A(c) varies continuously. Therefore the subspace W(c) will also 
vary continuously, provided that its dimension does not jump around. Continuous families 
of vector spaces of constant dimension, parametrized by a topological space C*, are called 
vector bundles over Ck. The module V is free if and only if the family of vector spaces W(c) 
forms a vector bundle. 


“Par une déformation coutumiére aux mathématiciens, 
je me’en tenais au point de vue trop restreint. 


—Jean-Louis Verdier 
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EXERCISES 


Section 1 Modules 


1.1. Let R be a ring, and let V denote the R-module R. Determine all homomorphisms 
g:V—> V. 


1.2. Let V be an abelian group. Prove that if V has a structure of Q-module with its given law 
of composition as addition, then that structure is uniquely determined. 


1.3. Let R = Z[q| be the ring generated over Z by an algebraic integer a. Prove that for any 
integer m, R/mR is finite, and determine its order. 


1.4. A module is called simple if it is not the zero module and if it has no proper submodule. 


(a) Prove that any simple R-module is isomorphic to an R-module of the form R/M, 
where M is a maximal ideal. 

(b) Prove Schur’s Lemma: Let pg: S > S’ be a homomorphism of simple modules. Then 
@ is either zero, or an isomorphism. 


Section2 Free Modules 


2.1. Let R = C[x, y], and let M be the ideal of R generated by the two elements x and y. Is 
M afree R-module? 


2.2. Prove that a ring R having the property that every finitely generated R-module is free is 
either a field or the zero ring. 


2.3. Let A be the matrix of a homomorphism ¢:Z” —> Z” of free Z-modules. 


(a) Prove that 9 is injective if and only if the rank of A, as areal matrix, is 7. 


(b) Prove that @ is surjective if and only if the greatest common divisor of the 
determinants of the m Xm minors of A is 1. 


2.4. Let J bean ideal ofa ring R. 
(a) Under what circumstances is / a free R-module? 
(b) Under what circumstances is the quotient ring R// a free R-module? 


Section3 Identities 
3.1. Let f denote the function on C” defined by evaluation of a (formal) complex polynomial 


f(x1,..., Xn). Prove that if f is the zero function, then f is the zero polynomial. 
3.2. It might be convenient to verify an identity only for the real numbers. Would this 
suffice? 


3.3. Let A and B be m Xm andn Xn R-matrices, respectively. Use permanence of identities 
to prove that trace of the linear operator f(M) = AMB on the space R’””” is the product 
(trace A)(trace B). 


3.4. In each case, decide whether or not permanence of identities allows the result to be 
carried over from the complex numbers to an arbitrary commutative ring. 


(a) the associative law for matrix multiplication, 
(b) the Cayley-Hamilton Theorem, 
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(c) Cramer’s Rule, 

(d) the product rule, quotient rule, and chain rule for differentiation of polynomials, 
(e) the fact that a polynomial of degree n has at most 1 roots, 

(f) Taylor expansion of a polynomial. 


Section 4 Diagonalizing Integer Matrices 


4.1. (a) Reduce each matrix to diagonal form by integer row and column operations. 


ee bee ae: 

-1 2 2 4 6 a ee 

(b) For the first matrix, let V = Z? andlet L = AV. Draw the sublattice L, and find 
bases of V and L that exhibit the diagonalization. 


(c) Determine integer matrices Q™! and P that diagonalize the second matrix. 


4.2. Let d,, do, ...be the integers referred to in Theorem 14.4.6. Prove that dj is the greatest 
common divisor of the entries a; ; of A. 


4.3. Determine all integer solutions to the system of equations AX = 0, when 
a E y al Find a basis for the space of integer column vectors B such that AX = B 


has a solution. 


4.4. Find a basis for the Z-module of integer solutions of the system of equations 
x+2y+3z=0,x+4y+9z=0. 


4.5. Let a, B, y be complex numbers. Under what conditions is the set of integer linear 
combinations {€a + mB + ny | £, m,n, € Z} a lattice in the complex plane? 


4.6. Let g: Z‘ + Z* be a homomorphism given by multiplication by an integer matrix A. 
Show that the image of ¢ is of finite index if and only if A is nonsingular and that if so, 
then the index is equal to |det A]. 


4.7. Let A = (aj,...,@,)' be an integer column vector, and let d be the greatest common 
divisor of a1,...,@n. Prove that there is a matrix P ¢€ GL, (Z) such that PA = 
(d,0,...,0)%. 

4.8. Use invertible row and column operations in the ring Z[i] of Gauss integers to diagonalize 

: 3 2+1 
the matrix is ae 9 |: 
A(L) 


4.9. Use diagonalization to prove that if L C M are lattices, then [M: L] = NGS 


Section 5 Generators and Relations 


5.1. Let R = Z[5], where 5 = J—5. Determine a presentation matrix as R-module for the 
ideal (2, 1+ 4). 


3d 2) 
§.2. Identify the abelian group presented by the matrix] 1 1 1 } 
23 6 
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Section6 Noetherian Rings 


6.1. Let VCC” be the locus of common zeros of an infinite set of polynomials fi, f2, f3, .... 
Prove that there is a finite subset of these polynomials whose zeros define the same locus. 


6.2. Find an example of a ring R and an ideal / of R that is not finitely generated. 
Section 7 Structure of Abelian Groups 


7.1. Find a direct sum of cyclic groups isomorphic to the abelian group presented by the matrix 
22 2 
2 2 O}. 
2 0 2 


7.2. Write the abelian group generated by x and y, with the relation 3x + 4y = 0 as a direct 
sum of cyclic groups. 


7.3. Find an isomorphic direct product of cyclic groups, when V is the abelian group generated 
by x, y, z, with the given relations. 


(a) 3x+2y4+8z =0,2x+4z=0 
(b) x+ y=0, 2x =0, 4x +2z =0,4x+2y+2z=0 
(c) 2x+y=0,x-y+3z=0 
(d) 7x +5y + 2z =0,3x + 3y = 0, 13x +11y+2z=0 
7.4. In each case, identify the abelian group that has the given presentation matrix: 


FARE Bega res eae Parle 


7.5. Determine the number of isomorphism classes of abelian groups of order 400. 


7.6. (a) Let aand bbe relatively prime positive integers. By manipulating the diagonal matrix 
with diagonal entries a and b, prove that the cyclic group Cg is isomorphic to the 
product Cg ® Cp. 


(b) What can you say if the assumption that a and b are relatively prime is dropped? 


7.7. Let R = Zi] and let V be the R-module generated by elements v; and v2 with relations 
(14+ i), + (2 — i)v2 = 0, 3v, + Siv2 = 0. Write this module as a direct sum of cyclic 
modules. 

7.8. Let F = Fp. For which prime integers p does the additive group F 1 have a structure of 
Z[i]-module? How about F?? 


7.9. Show that the following concepts are equivalent: 
« R-module, where R = Z[i], 
e abelian group V, with a homomorphism g: V > V such that go » = — identity. 
Section 8 Application to Linear Operators 


8.1. Let T be the linear operator on C2 whose matrix is E "a Is the corresponding 


C[t]-module cyclic? 
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8.2. 


8.3. 


8.4. 


Let M be a C[t]-module the form C[t]/(t — aw)”. Show that there is a C-basis for M, such 
that the matrix of the corresponding linear operator is a Jordan block. 


Let R = F[x] be the polynomial ring in one variable over a field F, and let V be the 
R-module generated by an element v that satisfies the relation (t? + 31+ 2)v = 0. Choose 
a basis for V as F'-vector space, and determine the matrix of the operator of multiplication 
by ¢ with respect to this basis. 


Let V be an F[t]-module, and let B = (v1, ..., Un) be a basis for V as F-vector space. 
Let B be the matrix of T with respect to this basis. Prove that A = t] — Bisa presentation 
matrix for the module. 


. Prove that the characteristic polynomial of the matrix (14.8.10) is f@. 
8.6. 


Classify finitely generated modules over the ring C[e], where €? = 0. 


Section9 Polynomial Rings in Several Variables 


9.1. 


Determine whether or not the modules over C[x, y] presented by the following matrices 
are free. 


. xy-1 x-1 x 
xe +1 x ia y yr 

(a) Eee al (b) | x ¥ » ©) x y 
y x? 2y 


9.2. Prove that the module presented by (14.9.2) is free by exhibiting a basis. 


9.3. 


9.4. 


9.5. 


Following the model of the polynomial ring in one variable, describe modules over the 
ring C[x, y] in terms of complex vector spaces with additional structure. 


Prove the easy half of the theorem of Quillen and Suslin: If V is free, then the rank of 
A(c) is constant. 

Let R = Z[V-5], and let V be the module presented by the matrix A = 1 ‘A a Prove 
that the residue of A in R/P has rank 1 for every prime ideal P of R, but that V is nota 
free module. 


Miscellaneous Problems 


M.1. In how many ways can the additive group Z/5Z be given the structure of a module over 


the Gauss integers? 


M.2. Classify finitely generated modules over the ring Z/(6). 


M.3. 


Let A be a finite abelian group, and let gy: A > C* be a homomorphism that is not the 
trivial homomorphism. Prove that }°,., y(a) = 0. 


M.4. When an integer 2 X2 matrix A is diagonalized by Q-'AP, how unique are the matrices 


M.5. 


P and Q? 
Which matrices A in GL2(R) stabilize some lattice L in R2? 
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M.6. (a) Describe the orbits of right multiplication by G = GL2(Z) on the space of 2 X2 
integer matrices. 
(b) Show that for any integer matrix A, there is an invertible integer matrix P such that 
AP has the following Hermitian normal form: 


dq 0 0 0 
ay dy 0 0 


a3 bz d3 0 ’ 


where the entries are nonnegative, a2 < d2, a3, b3 < d3, etc. 
M.7. Let S be a subring of the polynomial ring R = C[#] that contains C and is not equal to C. 
Prove that R is a finitely generated S-module. 
*M.8. (a) Let a be a complex number, and let Z[a] be the subring of C generated by a. Prove 
that @ is an algebraic integer if and only if Z[a] is a finitely generated abelian group. 
(b) Prove that if @ and £ are algebraic integers, then the subring Z[a, A] of C that they 
generate is a finitely generated abelian group. 
(c) Prove that the algebraic integers form a subring of C. 
*M.9. Consider the Euclidean space R*, with dot product (v-w). A lattice L in V is a 
discrete subgroup of V* that contains k independent vectors. If L is a lattice, define 
L* = (w | (v- w) € Zforall ve L). 


(a) Show that L has a lattice basis B = (vj, ..., vg), a set of k vectors that spans L as 
Z-module. 

(b) Show that L* is a lattice, and describe how one can determine a lattice basis for L* 
in terms of B. 


(c) Under what conditions is L a sublattice of L*? 
(d) Suppose that L C L*. Find a formula for the index [L*: L]. 


*M.10. (a) Prove that the multiplicative group Q* of rational numbers is isomorphic to the 
direct sum of a cyclic group of order 2 and a free abelian group with countably many 
generators. 

(b) Prove that the additive group Qt of rational numbers is not a direct sum of two 
proper subgroups. 
(c) Prove that the quotient group Qt /Z* is not a direct sum of cyclic groups. 


CHAPTER = 15 


Fields 


Our difficulty is not in the proofs, but in learning what to prove. 


—Emil Artin 


15.1 EXAMPLES OF FIELDS 


Much of the theory of fields has to do with a pair F C K of fields, one contained in the other. 
Givensuch a pair, K is called a field extension of F,, or an extension field. The notation K/F 
will indicate that K is a field extension of F. 

Here are the three most important classes of fields. 


Number Fields 
A number field K is a subfield of C. 


Any subfield of C contains the field Q of rational numbers, so it is a field extension of Q. The 
number fields most commonly studied are algebraic number fields, all of whose elements are 
algebraic numbers. We studied quadratic number fields in Chapter 13. 


Finite Fields 


A finite field is a field that contains finitely many elements. 


A finite field contains one of the prime fields Fp, and therefore it is an extension of that field. 
Finite fields are described in Section 15.7. 


Function Fields 
Extensions of the field F = C(f) of rational functions are called function fields. 
A function field can be defined by an equation f(t, x) = 0, where / is an irreducible complex 


polynomial in the variables ¢ and x, such as f(t, x) = x? — t? + t, for example. We may use 
the equation f(t, x) = 0 to define x “implicitly” as a function x(t) of t, as we learn to do in 


calculus. In our example, this function is x(t) = //f9 — t. The corresponding function field 
K consists of the combinations p + qv t? — t, where p and q are rational functions in ft. One 
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can work in this field just as one would in a field such as Q(V-5). For most polynomials 
f(@, x), there won’t be an explicit expression for the implicitly defined function x(¢), but 
by definition, it satisfies the equation f(t, x(7)) = 0. We will see in Section 15.9 that x(4) 
defines an extension field of F. 


15.2 ALGEBRAIC AND TRANSCENDENTAL ELEMENTS 


Let K be an extension of a field F, and let a be an element of K. By analogy with the 
definition of algebraic numbers (11.1), @ is algebraic over F if it is a root of a monic 
polynomial with coefficients in F’, say 


(15.2.1) fy =x" 4+ Qn—\x"1+4.-.+4a9, with a; in F, 


and f(a) = 0. An element is transcendental over F if it is not algebraic over F —if it is nota 
root of any such polynomial. 

These properties, algebraic and transcendental, depend on F. The complex number 
277i is algebraic over the field of real numbers but transcendental over the field of rational 
numbers. Every element a of a field K is algebraic over K, because it is the root of the 
polynomial x — a, which has coefficients in K. 

The two possibilities for a can be described in terms of the substitution homomorphism 


(15.2.2) yg: F[x] > K, definedby x~a. 


An element @ is transcendental over F if gy is injective, and algebraic over F if g is not 
injective, that is, if the kernel of g is not zero. We won’t have much to say about the case 
that a is transcendental. 

Suppose that @ is algebraic over F. Since F[x] is a principal ideal domain, the kernel 
of ¢ is a principal ideal, generated by a monic polynomial f(x) with coefficients in F. This 
polynomial can be described in various ways. 


Proposition 15.2.3 Let @ be an element of an extension field K of a field F that is 
algebraic over F. The following conditions on a monic polynomial f with coefficients in 
F are equivalent. The unique monic polynomial that satisfies these conditions is called the 
irreducible polynomial for a over F . 


e f is the monic polynomial of lowest degree in F[x] that has a as a root. 


e fis anirreducible element of F[x], and @ is a root of f. 


e f has coefficients in F, @ is a root of f, and the principal ideal of F[x] that is 
generated by f is a maximal ideal. 


* ais aroot of f, and if g is any polynomial in F[x] that has @ as a root, then f 
divides g. oO 
The degree of the irreducible polynomial for a over F is called the degree of a over F. 


It is important to keep in mind that the irreducible polynomial f depends on F as 
well as on a, because irreducibility of a polynomial depends on the field. The irreducible 
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polynomial for Vi over Q is x* + 1, but this polynomial factors in the field Q(i). The 
irreducible polynomial for Vi over Q(i) is x2 — i. When there are several fields around, it is 
ambiguous to say that a polynomial is irreducible. It is better to say that f is irreducible over 
F, or that it is an irreducible element of F(x]. 


Let K be an extension field of F. The subfield of K generated by an element a of K 
will be denoted by F(a): 


(15.2.4) F(a) is the smallest subfield of K that contains F and a. 


Similarly, ifa@,,...,a@, are elements ofan extension field K of F the notation F(a,,..., @x) 
will stand for the smallest subfield of K that contains these elements and F. 

As in Chapter 11, we denote the ring generated by a over F by F[q]. It is the image 
of the map y: F[x] — K defined above, and it consists of the elements 8 of K that can be 
expressed as polynomials in @ with coefficients in F: 


(15.2.5) B=bna"+-:-+bya+bo, bin F. 


The field F(a) is isomorphic to the field of fractions of F[a]. Its elements are ratios of 
elements of the form (15.2.5) (see Section 11.7). 

Similarly, if @,,..., @% are elements of K, the smallest subring of K that contains F 
and these elements is denoted by F [a, ..., a ]. It consists of the elements 6 of K that can 
be expressed as polynomials in the a; with coefficients in F. The field F(a, ..., ax) is the 
field of fractions of the ring F'[a1,..., a]. 

Ifanelement awof F is transcendental over F’, the map F[x] > F[a@]isanisomorphism. 
In that case F(q@) is isomorphic to the field F(x) of rational functions. The field extensions. 
F(a) are isomorphic for all transcendental elements a. 

Things are different when a@ is algebraic: 


Proposition 15.2.6 Let a be an element of an extension field K/F which is algebraic over 

F, and let f be the irreducible polynomial for a over F. 

(a) The canonical map F[x]/(f) — F[a] is an isomorphism, and F[qa] is a field. Thus 
Fla] = F(a). 

(b) More generally, let a@,,...,a% be elements of an extension field K/F, which are 
algebraic over F. The ring F[oy, ..., a] is equal to the field F(a1,..., a). 


Proof. (a) Let g: F[x] > K be the map (15.2.2). Since the ideal (f) is maximal, f(x) 
generates the kernel, and F[x]/(f) is isomorphic to the image of gy, which is F[a]. 
Moreover, F[x]/(f ) is a field, and therefore F[a] is a field. Since F(q) is the fraction field 
of F [a], it is equal to F[a]. 


(b) This follows by induction: 
F lei... y) = F [a, % » &K_1| [rx] = F(a,..., &y_-1) [ox] = F(aj,...,@n). O 


The next proposition is a special case of Proposition 11.5.5. 


Proposition 15.2.7 Let a be an algebraic element over F, and let f(x) be the irreducible 
polynomial for @ over F. If f(x) has degree n, i., if a has degree n over F, then 
(1, a, ...,a@"~!) is a basis for F(a) as a vector space over F.- O 
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For instance, the irreducible polynomial for w = e2”'/3 over Q is x* + x + 1. The degree of 
w over Q is 2, and (1, w) is a basis for Q(@) over Q. 


It may not be easy to tell whether two algebraic elements @ and # generate isomorphic 
field extensions, though Proposition 15.2.7 provides a necessary condition: They must have 
the same degree over F, because the degree of a over F is the dimension of F(a) as an 
F-vector space. This is obviously not a sufficient condition. All of the imaginary quadratic 
fields studied in Chapter 13 are obtained by adjoining elements of degree 2 over Q, but they 
aren’t isomorphic. 

On the other hand, if a is a complex root of x? — x + 1, then B = a +1 is a root of 
x? — 3x? + 2x + 1. The fields Q(@) and Q(B) are the same. If we were presented only with 
the two polynomials, it might take some time to notice how they are related. 

What we can describe easily are the circumstances.under which there is an isomorphism 
F(a)— F(B) that fixes F and sends a to £. The next proposition, though very simple, is 
fundamental to our understanding of field extensions. 


Proposition 15.2.8 Let F be a field. and let a and B be elements of field extensions K/F 
and L/F. Suppose that @ and £ are algebraic over F. There is an isomorphism of fields 
o: F(a)— F(B) that is the identity on F and that sends a ~ £ if and only if the irreducible 
polynomials for @ and B over F are equal. 


Proof. Since q@ is algebraic over F, F[a] = F(a), and similarly, F[8] = F(8). Suppose that 
the irreducible polynomials for @ and for B over F are both equal to f. Proposition 15.2.6 
tells us that there are isomorphisms 


Fiay/(f) & Fla] and F[x]/(f) > F{6}. 


The composed map o = yq"! is the required isomorphism F(a) — F(B). Conversely, if 
there is an isomorphism oa that is the identity on F and that sends a@ to f, and if f(x) isa 
polynomial with coefficients in F such that f(a) = 0, then f(8) = 0 too. (See Proposition 
15.2.10 below.) So the irreducible polynomials for the two elements are equal. O 


For instance, let a, denote the real cube root of 2, and let w = e?'/3 be a complex 
cube root of 1. The three complex roots of x? —2 are a), a2 = wa and a3 = aa. Therefore 
there is an isomorphism Q(a;) > Q(a@2) that sends a; to a>. In this case the elements of 
Q(q@) are real numbers, but a2 is not a real number. To understand this isomorphism, we 
must look only at the internal algebraic structure of the fields. 


Definition 15.2.9 Let K and K’ be extensions of the same field F. An isomorphism 
y:K — K’ that restricts to the identity on the subfield F is called an F-isomorphism, or an 
isomorphism of field extensions. If there exists an F-isomorphism gy: K > K’, K and K' are 
isomor phic extension fields. 


The next proposition was proved for complex conjugation before (12.2.19). 
Proposition 15.2.10 Let g@: K > K’ be an isomorphism of field extensions of F, and let f 


be a polynomial with coefficients in F. Let @ be a root of f in K, and let a’ = (a) be its 
image in K’. Then a’ is also a root of f. 
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Proof. Say that f(x) = anx” +---+a,x+ a9. Since gis an F-isomorphism and since a; 
are in F,, p(aj) = aj. Since g is a homomorphism, 


0 = 9(0) = g(f(@)) = Plana” +--+. + aya + ao) 
= (an) p(a@)” +--+ + Gay) P(e) + Y(an) = anol” + +--+ ayo + ag. 


Therefore a’ is a root of f. Oo 


15.3. THE DEGREE OF A FIELD EXTENSION 


A field extension K of F can always be regarded as an F-vector space. Addition is the 
addition law in K, and scalar multiplication of an element of K by an element of F is 
obtained by multiplying these two elements in K. The dimension of K, when regarded as an 
F-vector space, is called the degree of the field extension. This degree, which is denoted by 
[K: F], is a basic property of a field extension. 


(15.3.1) [K:F]is the dimension of K, asan F-vector space. 


For example, C has the R-basis (1, i), so the degree [C: R] is 2. 
A field extension K/ F is a finite extension if its degree is finite. Extensions of degree 2 
are quadratic extensions, those of degree 3 are cubic extensions, and so on. 


Lemma 15.3.2 


(a) A field extension K/F has degree 1 if and only if F = K. 


(b) An element @ of a field extension K has degree 1 over F if and only if a is an element 
of F. 


Proof. (a) If the dimension of K as vector space over F is 1, any nonzero element of K, 
including 1, will be an F-basis, and if 1 is a basis, every element-of K is in F. 


(b) By definition, the degree of a over F is the degree of the (monic) irreducible polynomial 
for a over F. If aw has degree 1, then this polynomial must be x — @, and if x — @ has 
coefficients in F, then q@ is in F. O 


Proposition 15.3.3 Assume that the field F does not have characteristic 2, that is, 1 +140 
in F. Then any extension K of degree 2 over F can be obtained by adjoining a square 
root: K = F(6), where 6° = d is an element of F. Conversely, if 6 is an element of a field 
extension of F, and if 6? is in F but 6 is not in F, then F(6) is a quadratic extension of F. 


It is not true that all cubic extensions can be obtained by adjoining a cube root. We 
learn more about this point in the next chapter (see Section 16.11). 


Proof. We first show that every quadratic extension K can be obtained by adjoining a root 
of a quadratic polynomial f(x) with coefficients in F. We choose an element @ of K that 
is not in F. Then (1, q@) is a linearly independent set over F. Since K has dimension 2 as a 
vector space over F, this set is a basis for K. It follows that a? is a linear combination of 
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(1, @) with coefficients in F. We write this linear combination as a* = -ba — c. Then aw isa 
root of f(x) = x? + bx +c, and since @ is not in F, this polynomial is irreducible over F. 
This much is also true when the characteristic is 2. 

The discriminant of the quadratic polynomial f is D = b*—4c. Ina field of characteristic 
not 2, the quadratic formula 4(-b +4/D) solves the equation x” + bx +c = 0. This is proved 
by substituting into the polynomial. There are two choices for the square root, let 6 be one 
of them. Then 6 is in K, 62 is in F, and because aq is in the field F(8), 6 generates K over 
F. Conversely, if 5? is in F but 6 is not in F, then (1, 5) will be an F-basis for F(8), so 
[F(8): F] = 2. Oo 


The term degree comes from the case that K is generated by one algebraic element a: 
K = F(a). This is the first important property of the degree: 


Proposition 15.3.4 


(a) If an element @ of an extension field is algebraic over F, the degree [ F(a): F'] of F(a) 
over F is equal to the degree of a over F. 


(b) An element a ofan extension field is algebraic over F if and only if the degree [ F(a) : F] 
is finite. 


Proof. If a is algebraic over F, then by definition, its degree over F is equal to the degree 
of its irreducible polynomial f over F. And if f has degree n, then F(a) has the F-basis 
(1, a, ...,a@77!) (Proposition 15.2.7), so [ F(a): F] =n. Ifa is not algebraic, then F[a] and 
F(a) have infinite dimension over F. O 


The second important property relates degrees in chains of field extensions. 


Theorem 15.3.5 Multiplicative Property of the Degree. Let F C K C L be fields. Then 
[L: F] =[L:K][K: F]. Therefore both [L: K] and [K: F] divide [L: F]. 


Proof. Let B = (B;...., Bn) bea basis for L as a K-vector space, and let A = (@1,...,@m) 
be a basis for K as F-vector space. So [L: K] =n and [K: F] = m. To prove the theorem, 
we show that the set of mn products P = {a@;f;} is a basis of L as F-vector space. The 
reasoning in case one of the degrees is infinite is similar. 

Let y be an element of L. Since B is a basis for L over K, y can be expressed uniquely 
as a linear combination b; A; + --- + bn Bn, with coefficients b; in K. Since A is a basis for 
K over F, each b; can be expressed uniquely as a linear combination aj; +--+ +@m jQm, 
with coefficients a;; in F. Then y = Di. jai jotiB ;. This shows that P spans L as an F-vector 
space. If a linear combination >); jai joiB; is zero, then because B is a basis for L as 
K-vector space, the coefficient }); aj ja; of B; is zero for every j. This being so, a;; is zero 
for every i and every j because A is a basis for K over F. Therefore P is independent, and 
hence it is a basis for L over F. O 


Corollary 15.3.6 


(a) Let FC K bea finite field extension of degree n, and let a be an element of K. Then a 
is algebraic over F, and its degree over F divides n. 
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(b) Let FC F’ CL be fields. If an element @ of L is algebraic over F, it is algebraic over 
F’. Ifa has degree d over F, its degree over F’ is at most d. 

(c) A field extension K that is generated over F by finitely many algebraic elements is a 
finite extension. A finite extension is generated by finitely many elements. 

(d) If K is an extension field of F, the set of elements of K that are algebraic over F is a 
subfield of K. 


Proof. (a) The element @ generates an intermediate field F C F(a) C K, and the multi- 
plicative property states that [K: F] =[K: F(a)][ F(a): F]. Therefore | F(a): F] is finite, 
and it divides [ K: F')]. 


(b) Let f denote the irreducible polynomial for a over F’. Since FC F’, f is also an element 
of F’ [x]. Since & is a root of f, the irreducible polynomial g for w@ over F’ divides f. So the 
degree of g is at most equal to the degree of f. 


(c) Let a1, ..., a, be elements that generate K and are algebraic over F’,, and let F; denote 
the field F(a, ..., a@;) generated by the first 7 of the elements. These fields form a chain 
F=fFfoCF,C.---C Fy = K. Since q; is algebraic over F, it is also algebraic over the 
larger field F;_,. Therefore the degree [F;: Fj_,] is finite for every i. By the multiplicative 
property, [ K: F] is finite. The second assertion is obvious. 


(d) We must show that if @ and A are elements of K that are algebraic over F, then a + £, 
af, etc., are algebraic over F’. This follows from (a) and (c) because they are elements of 
the field F(a, 8). OD 


Corollary 15.3.7 Let K be an extension field of F of prime degree p. If an element a of K 
is not in F, then aw has degree p over F and K = F(a). O 


Corollary 15.3.8 Let K be an extension field of a field F, let K and F’ be subfields of K that 
are finite extensions of F,, and let K’ denote the subfield of K generated by the two fields K 
and F’ together. Let [K’: F] = N, |K: F] =m and [F’: F] = n. Then m and n divide WN, 
and N < mn. 


Proof. The multiplicative property shows that m and n divide N. Next, suppose that F’ 
is generated over F by one element: F’ = F(f) for some element 6. Then K' = K(f). 
Corollary 15.3.6(b) shows that the degree of B over K, which is equal to [K’: K], is at most 
equal to the degree of 6 over F,, which isn. The multiplicative property shows that N < mn. 
The case that Fis generated by several elements follows by induction, when one adjoins one 
element at a time. Oo 


The diagram below sums up the corollary: 


(15.3.9) K’ 
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It follows from the corollary that the degree N of K’ over F is divisible by the least common 
multiple of m and n, and that if m and 7 are relatively prime, then N = mn. 
It might be tempting to guess that N divides mn, but this isn’t always true. 


Examples 15.3.10 


(a) The three complex roots of x —2 area, =a, a = wa, and a3 = wa, where a = J2 
and w = e?”'/3. Each of the roots a; has degree 3 over Q, but Q(a1, a2) = O(a, w), 
and since w has degree 2 over Q, [Q(a1, 2) :Q] = 6. 

(b) Let a = 2 and let B be a root of the irreducible polynomial x* + x + 1 over Q. Because 
3 and 4 are relatively prime, Q(a@, B) has degree 12 over Q. Therefore @ is not in the 
field Q(B). On the other hand, since i has degree 2 over Q, it is not so easy to decide 
whether or not i is in Q(B). (It is not.) 

(c) Let K = Q(v2, i) be the field generated over Q by J/2 and i. Bothiand V2 have degree 
2 over Q, and since i is complex, it is not in Q(V/2). So [Q(V2, i):Q] = 4. Therefore the 
degree of i over Q(V2) is 2. Since V-2 and i also generate K, i is not in the field Q[V-2] 
either. O 


15.4 FINDING THE IRREDUCIBLE POLYNOMIAL 


Let y be an element of an extension field K of F, and assume that y is algebraic over 
F. There are two general methods to find the irreducible polynomial f(x) for y over 
F, One is to compute the powers of y and to look for a linear relation among them. 
Sometimes, though not very often, one can guess the other roots of f, say y1,..., ¥g, with 
y = ~%. Then expanding the product will (x — y,)---(x — yx) produce the polynomial. 
We'll give an example to illustrate the two methods, in which F is the field Q of rational 
numbers. 


Example 15.4.1 Let y = /2 + /3. We compute powers of y, and simplify when possible: 
y? =54+2/%6, 74 = 494 20/6. We won't need the other powers because we can eliminate 
6 from these two equations, obtaining the relation y4 — 10y? + 1 = 0. Thus y is a root of 
the polynomial g(x) = x* — 10x? +1. oO 


Two important elementary observations are implicit here: 


Lemma 15.4.2 

(a) A linear dependence relationc, y” +---+c1y+co = 0 among powers of an element y 
means that y is a root of the polynomial c,x” +---+c)x+ cp. 

(b) Let a and £ be algebraic elements of an extension field of F’, and let their degrees over 
F be d, and dp, respectively. The djd2 monomials a’ BJ, with 0 < i<d; and0 < j<d), 
span F(a, B) as F-vector space. 


Proof. Though important, (a) is trivial. To prove (b), we note that because a and # are 
algebraic over F, F(a, 8) = F[a, B] (15.2.6). The monomials listed span F[a, 8]. D 
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Example 15.4.3 The alternate approach to Example 15.4.1 is to guess that the roots of g 
might be 4, = J24+ V3,» =-v2- V3, y3= -/2 + 73, and y4 = V2 — V3. Expanding 
the polynomial with these roots, we find 


(x — yi) — yo) (x — ys) (% — Y4) 
= (x2 — (V2 + V3)2) (x2 ~ (V2 — V3)?) = x4 — 10x? 41. 


This is the polynomial that we obtained before. Oo 


The lemma shows that one can always produce a polynomial having an element such 
as y = a + B as a root, provided that the irreducible polynomials for ~@ and 6 are known. 
Say that @ and B have degrees d; and d> over F, respectively. Given any element y of 
F(a, B), we write its powers 1, y, y’: ..., ¥” as linear combinations of the monomials 
a'B/ with 0 < i < d, and 0 < j < d). When n = did, we get n +1 powers y” that 
are linear combinations of nm monomials. So the powers are linearly dependent. A linear 
dependence relation determines a polynomial with coefficients in F with y as a root. 
However, there is a point that complicates matters. The polynomial with root y that we 
find in this way may be reducible. The irreducible polynomial for y over F is the lowest 
degree polynomial with root y. To determine it by this method, we would need a basis for K 
over F. 


Examples 15.4.4 

(a) In Example 15.4.1, where a = J/2, B = V3 and d; = d> = 2, the elements a A/ with 
i, j < 2 are 1, /2, /3, and /6. These elements do form a basis of K over Q. The 
polynomial x4 — 10x? + 1 is irreducible. 

(b) We go back to Example 15.3.10(a), in which the three roots of the polynomial x* — 2 
are labeled a;, i = 1, 2,3. Let F = Q, L = Q(q,) and K = Q(q@j, a2). Each of the 
roots a; has degree 3 over F. According to the lemma, the nine monomials at! oe} with 
0 < i, 7<3span K over F’. However, these monomials aren’t independent. Since f has 


a root a, in the field L, it factors in L[x], say f(x) = (x — &1)q(x). Then q2 is a root of 
q(x), SO a2 has degree at most 2 over L. The set (1, a2) is a basis for K over the field 


L, so the six monomials ot! ov} with 0 < i <3 and0 < j <2 forma basis for K over F. 
If we want a basis of monomials, we should use this one. O 


15.5 RULER AND COMPASS CONSTRUCTIONS 


Famous theorems assert that certain geometric constructions cannot be done with ruler and 
compass alone. To illustrate these theorems, we use the concept of degree of a field extension 
to prove the impossibility of trisection of an angle. 


Here are the rules for ruler and compass construction: 


(15.5.1) 


e Two points in the plane are given to start with. These points are constructed. 
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e If two points po, pi have been constructed, we may draw the line through them, or 
draw a circle with center at po and passing through p;. Such lines and circles are 
then constructed. 

¢ The points of intersection of constructed lines and circles are constructed. 


Points, lines, and circles will be called constructible if they can be obtained in finitely many 
steps, using these rules. 

Notice that our ruler may be used only to draw lines through constructed points. 
We are not allowed to use it for measurement. Sometimes the ruler is referred to as a 
“straight-edge” to emphasize this point. 

We begin with some familiar constructions. In each figure, the lines and circles are to 
be drawn in the order indicated. The first two constructions make use of a point g on £ whose 
only restriction is that it is not on the perpendicular. Whenever we need an arbitrary point, 
we will construct a particular one for the purpose. We can do this because a constructed line 
£ contains infinitely many points that can be constructed. 


Construction 15.5.2 Construct a line through a constructed point p and perpendicular to a 
constructed line @. 
Case 1: p¢ € 


Case 2: p< & 
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Construction 15.5.3 Construct a line parallel to a constructed line £ and passing through a 
constructed point p. 


Apply Cases 1 and 2 above: 


Construction 15.5.4 Mark off a length defined by two points onto a constructed line @, with 
endpoint p. 


Use the construction of parallels: 


given length > length marked 


off on £ 


We introduce Cartesian coordinates into the plane so that the points that are given at 
the start have coordinates (0, 0) and (1, 0). 


Proposition 15.5.5 


(a) Let po = (ao, bo) and p; = (a), b,) be points whose coordinates a; and 5; are in a 
subfield F of the field of real numbers. The line through po and py is defined by a linear 
equation with coefficients in F’. The circle with center po and passing through pj is 
defined by a quadratic equation with coefficients in F. 

(b) Let A and B be lines or circles defined by linear or quadratic equations, respectively, 
that have coefficients in a subfield F of the real numbers. Then the points of intersection 
of A and B have coordinates in F, or in a real quadratic field extension F’ of F. 


Proof. (a) The line through (ao, bo) and (a1, 51) is the locus of the linear equation 


(a, — ao) (y — bo) = (bi — bo) (x — ap). 
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The circle with center (ao, by) and passing through (a), ),) is the locus of the quadratic 
equation ; ; : ; 
(x — ap)” + (y — bo)” = (a, — ao) + (Bh — bg). 


The coefficients of these equations are in F. 


(b) The point of intersection of two lines is found by solving two linear equations with 
coefficients in F’, so its coordinates are in F’. To find the intersection of a line and a circle, 
we use the equation of the line to eliminate one variable from the equation of the circle, 
obtaining a quadratic equation in one unknown. This quadratic equation has solutions in 
the field F’ = F(./D), where D is its discriminant. The discriminant is an element of F. If 
F’ # F, then the degree of F’ over F is 2. If D is negative, there is no real solution to the 
equations. Then the line and circle do not intersect. 


Consider the intersection of two circles, say 
(x-a,)? + (y—b})? =e) and (x-a)?+(y— bo)? =e, 


where a;, b;, and c; are in F. In general, the solution of a pair of quadratic equations in two 
variables requires solving an equation of degree 4. In this case we are lucky: The difference 
of the two quadratic equations is a linear equation. We can use that linear equation to 
eliminate one variable, as before. The lucky event reflects the fact that, whereas a pair of 
conics may intersect in four points, two circles intersect in at most two points. O 


Theorem 15.5.6 Let p be a constructible point. For some integer 7, there is a chain of fields 
Q=FyoC FiCPoC:-:-C F, = K, such that 


e K is asubfield of the field of real numbers: 

¢ the coordinates of p are in K; 

* foreachi=0,...,m —1, the degree [ Fj,1: Fj] is equal to 2. 
Therefore the degree [K :Q] is a power of 2. 


Proof. We introduced coordinates so that the points originally given are (0, 0) and (1,0). 
These points have coordinates in Q. The process of constructing the point p involves a 
sequence of steps, each one of which draws a line or a circle. 

Suppose that all points constructed by the time we are at the kth step have coordinates 
in a ficld F. The next step constructs a line or circle through two of these points, and 
according to Proposition 15.5.5(a), the line or circle has an equation with coefficients in F. 
The field does not change. Then according to Proposition 15.5.5(b), any point of intersection 
of the lines and circles constructed so far will have coordinates, either in F, or in a real 
quadratic extension of F’. The assertion follows by induction from Proposition 15.5.5 and 
from the multiplicative property of the degree. O 


e We call a real number a constructible if the point (a,0) is constructible. Since we 
can construct perpendiculars, this is the same thing as saying that a is the x-coordinate 
of a constructible point. And since we can mark off lengths, a positive real number a is 
constructible if and only if there is a pair p, g of constructible points whose distance apart is a. 
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Corollary 15.5.7 Let a be a constructible real number. Then a is an algebraic number, and 
its degree over Q is a power of 2. 


Since a is in a field K that is the end of a chain of fields as in the theorem, and since [ K :Q] 
is a power of 2, the degree of a is also a power of 2 (15.3.6). O 


The converse of this corollary is false. There exist real numbers of degree 4 over Q that 
aren’t constructible. Galois theory provides a way to understand this. (This is Exercise 9.17 
of Chapter 16.) 

We can now prove the impossibility of certain geometric constructions. The method 
is to show that if a certain construction were possible, then it would also be possible to 
construct an algebraic number whose degree over Q is not a power of 2. This would contradict 
the corollary. Our example is the impossibility of trisection of the angle, which asks for a 
construction of the angle 30 when @ is given. Now many angles, 45° for instance, can be 
trisected. The trisection problem asks for a general method of construction that will work 
for any “given” angle. 

Since it is easy to construct an angle of 60°, we can give this angle to ourselves, using 
ruler and compass constructions. If trisection were possible, we could construct an angle of 
20°. We will show that it is impossible to construct that particular angle, and therefore that 
there is no general method of trisection. 

We'll say that an angle @ is constructible if it is possible to construct a pair of lines 
meeting with angle 9. If we mark off a unit length on one of the lines and drop a perpendicular 
to the other line, we will have constructed the real number cos @. Conversely, if cos@ is a 
constructible real number, we can reverse this process to construct a pair of lines meeting 
with angle 6. 

The next lemma shows that 20° = 2/9 cannot be constructed. 


Lemma 15.5.8 The real number cos20° is algebraic over Q and its degree over Q is 3. 
Therefore cos 20° is not a constructible number. 


Proof. Let a = 2cos@ = e!® + e~?, where 9 = 17/9. Then e*? + e349 = 2cos(x/3) = 1, 
and 

8 = (e!9 + e7!)3 = 63 4 36! 4 30-1 4 6-31 — 1 4 3a, 
so @ is a root of the polynomial x* — 3x — 1. This polynomial is irreducible over Q because it 
has no integer root. It is therefore the irreducible polynomial for a over Q. So a has degree 
3 over Q, and so does cos 0. oO 


One more example: The regular 7-gon cannot be constructed. This is similar to the 
above problem, because constructing 20° is equivalent to constructing the 18-gon. We’ll vary 
the approach slightly. Let 9 = 27/7 and let ¢ = e’9. Then ¢ is a seventh root of unity, a 
root of the irreducible polynomial equation x® + x>° + --- +1 (Theorem 12.4.9), so ¢ has 
degree 6 over Q. If the 7-gon were constructible, then cos @ and sin@ would be constructible 
numbers. They would lie in a real field extension K whose degree over Q is a power of 2, say 
2. Call this field K, and consider the extension K(i) of K. This extension has degree 2, so 
[K (i):Q] = 2k+1 But ¢=cos@+ isin@ is in K (i). This contradicts the fact that the degree 
of fis 6. 
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The argument we have used is not special to the number 7. It applies to any prime 
integer p, provided that p — 1, the degree of the irreducible polynomial x?-! + ...+x+1, 
is not a power of 2. 


Corollary 15.5.9 Let p be a prime integer. If the regular p-gon can be constructed with ruler 
and compass, then p = 2” + 1 for some integer r. O 


Gauss proved the converse: If a prime has the form 2” + 1, then the regular p-gon can be 
constructed. The regular 17-gon, for example, can be constructed by ruler and compass. We 
will learn how to prove this in the next chapter (see Corollary 16.10.5). 


To complete the discussion, we prove a converse to Theorem 15.5.6. 


Theorem 15.5.10 Let Q = Fy C Fi C---C F, = K be achain of subfields of the field R 
of real numbers with the property that for each i = 0,...,”-1, [ Fist : Fj] = 2. Then every 
element of K is constructible. 


Since any extension of degree 2 can be obtained by adjoining a square root, the theorem 
follows from the next lemma. 


Lemma 15.5.11 


(a) The constructible numbers form a subfield of R. 
(b) If ais a positive constructible number, then so is ./a. 


Proof. (a) We must show that if a and b are positive constructible numbers, then a + b, -a, 
ab, and a“! (if a#0) are also constructible. The closure in case a or b is negative follows 
easily. Addition and subtraction are done by marking lengths on a line. For multiplication 
and division, we use similar right triangles. 


Given one triangle and one side of a second triangle, the second triangle can be constructed 
by parallels. To construct the product ab, we take r = 1, s = a,andr’ = b. Then s’ = ab. To 
construct a~!, we take r = a,s = 1, andr’ = 1. Thens’ =a™!. 


(b) We use similar triangles again. We must construct them so that r = a, r’ = s, and 
s’ = 1. Then s = ./a. How to make the construction is less obvious this time, but we can 
use inscribed triangles in a circle. A triangle inscribed into a circle, with a diameter as its 
hypotenuse, is a right triangle. This is a theorem of high school geometry, and it can be 
checked using the equation for a circle and Pythagoras’s theorem. So we construct a circle 
whose diameter is 1 + a and proceed as in the figure below. 
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15.6 ADJOINING ROOTS 


Up to this point, we have used subfields of the complex numbers as our examples. Abstract 
constructions are not needed to create these fields, except that the construction of the 
complex number field as an extension of the real number field is abstract. We simply adjoin 
complex numbers to the rational numbers as desired, and work with the subfield they 
generate. But finite fields and function fields are not subfields of a familiar, all-encompassing 
field analogous to C, so these fields must be constructed. The fundamental tool for their 
construction is the adjunction of elements to a ring, which was described in Chapter 11. It is 
applied here to the case that the ring we start with is a field. 

We review the construction. Given a polynomial f(x) with coefficients in a field F, we 
may adjoin a root of f to F. The procedure is to form the quotient ring 


(15.6.1) K = Fix]/(f) 


of the polynomial ring F[x]. This construction always yields a ring K and a homomorphism 
F -—» K, such that the residue X of x satisfies the relation f(x) = 0 (11.5.2). However, we 
want to construct not only a ring, but a field. Here the theory of polynomials over a field 
comes into play. It tells us that the principal ideal (f) in the polynomial ring F[x] is a 
maximal ideal if and only if f is an irreducible polynomial (12.2.8). Therefore K will be a 
field if and only if f is irreducible (11.8.2). 


Lemma 15.6.2 Let F be a field, and let f be an irreducible polynomial in F[x]. Then the 
ring K = F[x]/(/) is an extension field of F, and the residue X of x isa root of f(x) in K. 


Proof. The ring K is a field because (f) is a maximal ideal, and the homomorphism 
F — K, which sends the elements of F to the residues of the constant polynomials, is 
injective because F is a field (11.3.20). So we may identify F with its image, a subfield of K. 
The field K becomes an extension of F by means of this identification. Finally, X satisfies 
the equation f(x) = 0. It is a root of f (see (11.5.2)). O 


¢ A polynomial f splits completely in a field K if it factors into linear factors in K. 


Proposition 15.6.3 Let F be a field, and let f(x) be a monic polynomial in F[x] of positive 
degree. There exists a field extension K of F such that f(x) splits completely in K. 


Proof. We use induction on the degree of f. The first case is that f has a root @ in F, so 
that f(x) = (x — a)q(x) for some polynomial g. If so, we replace f by g, and we are done 
by induction. Otherwise, we choose an irreducible factor g of f. By Lemma 15.6.2, there is 
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a field extension F) of F in which g has a root a. Then a is a root of f too. We replace F 
by Fy}, and this reduces us to the first case. O 


As we see, the polynomial ring F[x] is an important tool for studying extensions of 
a field F. When we are working with field extensions, there is an interplay between the 
polynomial rings over the fields. This interplay doesn’t present serious difficulties, but instead 
of scattering the points that should be mentioned about in the text, we have collected them 
into the next proposition. 


Proposition 15.6.4 Let f and g be polynomials with coefficients in a field F, with f#0, and 
let K be an extension field of F. 


(a) The polynomial ring K|x]| contains F[x] as subring, so computations made in the ring 
F [x] are also valid in K[x]. 

(b) Division with remainder of g by f gives the same answer, whether carried out in Fx] 
or in K[x]. 

(c) f divides g in K[x] if and only if f divides g in F[x]. 

(d) The (monic) greatest common divisor d of f and g is the same, whether computed in 
F[x] or in K[x]. 

(e) If f and g have acommon root in K, they are not relatively prime in F[x]. If f and 
g are not relatively prime in F[x], there exists an extension field in which they have a 
common root. 

(f) If f is anirreducible element of F[x] and if f and g have acommon root in K, then f 
divides g in F[x]. 


Proof. (a) This is obvious. 


(b) Carry out the division in F'[x]: g = fq +r. This equation remains true in the bigger ring 
K[-x], and since division with remainder in K[x] is unique, carrying the division out in K[x] 
leads to the same result. 


(c) This is (b) in the case that the remainder is zero. 


(d) Let d and d’ denote the greatest common divisors of f and g in F[x] and K[x], 
respectively. Then dis a common divisor in K[x], and since d’ is the greatest common divisor 
in K[x], d divides d’. In addition, we know that d has the form d = pf + qg, for some 
elements p and q in Fx]. Since d’ divides f and g, d’ divides d. Thus d and d’ are associates 
in K[x], and since they are monic polynomials, they are equal. 


(e) Let @ be a common root of f and g in K. Then x — @ is a common divisor of f and 
g in K[x]. So the greatest common divisor of f and g in K[x] isn’t 1. By (d), it isn’t 1 in 
F [x] either. Conversely, if f and g have a common divisor d of positive degree, there is an 
extension field of F in which d has a root. This root will be a common root of f and g. 


(f) If f is irreducible, its only monic divisors in F[x] are 1 and f. Part (e) tells us that the 
greatest common divisor of f and g in F[x] isn’t 1. Therefore it is f. O 


The final topic of this section is the derivative f’(x) of a polynomial f(x). The 
derivative is computed using the rules from calculus for differentiating polynomial functions. 
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In other words, if f(x) = anx" + an_yx""! +--.+a,x + apo, then 
(15.6.5) f'(x) = nanx"! + (a = Vag_1x"? +++ +a4. 


The integer coefficients in this formula are interpreted as the elements 1+---+1 of F. Soif 
f has coefficients in a field F,, its derivative does too. It can be shown that familiar rules of 
differentiation, such as the product rule, hold. (This is Exercise 3.5.) 


The derivative can be used to recognize multiple roots of a polynomial. 


Lemma 15.6.6 Let f be a polynomial with coefficients in a field F. An element q@ in an 
extension field K of F is a multiple root, meaning that (x — a)? divides f, if and only if it is 
aroot of f and also a root of f’. 


Proof. If awisa root of f, then x —q@ divides f, say f(x) = (x —a@) g(x). Then a is a multiple 
root of f if and only if it is a root of g. By the product rule for differentiation, 


f(x) = (x — ag’ (x) + g(x). 
Substituting x = q@, one sees that f’(a@) = Oif and only if g(a) = 0. Oo 


Proposition 15.6.7 Let f(x) be a polynomial with coefficients in F. There exists a field 
extension K of F in which f has a multiple root if and only if f and /f’ are not relatively 
prime. 


Proof. If f has a multiple root in K, then f and f’ have acommon root in K, so they are 
not relatively prime in K or in F. Conversely, if f and /’ are not relatively prime, then they 
have a common root in some field extension K, hence f has a multiple root there. oO: 


Here is one of the most important applications of the derivative to field theory: 


Proposition 15.6.8 Let f be an irreducible polynomial in F[x]. 


(a) f has no multiple root in any field extension of F unless the derivative /’ is the zero 
polynomial. 


(b) If F is a field of characteristic zero, then f has no multiple root in any field extension of 
F. 


Proof. (a) We must show that f and /’ arerelatively prime unless /’ is the zero polynomial. 
Since it is irreducible, f will have a nonconstant factor in common with another polynomial 
g only if f divides g. And if f divides g, then unless g = 0, the degree of g will be at least 
as large as the degree of /. If the derivative /’ isn’t zero, its degree is less than the degree 
of f, and then f and /’ have no common nonconstant factor. 


(b) In a field of characteristic zero, the derivative of a nonconstant polynomial isn’t zero. O 


The derivative of a nonconstant polynomial f may be zero when F has prime 
characteristic p. This happens when the exponent of every monomial that occurs in f is 
divisible by p. A typical polynomial whose derivative is zero in characteristic 5 is 


f(x) =x) ¢ ax + bP +c, 
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where a, b, c can be any elements of F’. Since the derivative of this polynomial is identically 
zero, all of its roots in an extension field will be multiple roots. 


15.7 FINITE FIELDS 


In this section, we describe the fields of finite order. The characteristic of a finite field K 
cannot be zero, so it is a prime integer (3.2.10), and therefore K will contain one of the prime 
fields F = Fp. Since K is finite, it will be finite-dimensional when considered as a vector 


space over this field. 

Let r denote the degree [K: F]. As an F-vector space, K is isomorphic to the space 
FT of column vectors, which contains p” elements. So the order of a finite field, the number 
of its elements, is a power of a prime. It is customary to use the letter g for this order: 


(15.7.1) |K| = p’ =q. 


In this section, g will denote a positive power of a prime integer p. Fields of order qg are 
often denoted by Fy. We are going to show that all finite fields of order g are isomorphic, so 
this notation isn’t too ambiguous, though when ¢ > 1 the isomorphism between two of them 
will not be unique. 

The simplest example of a finite field other than a prime field is the field F4 of order 4. 
Let K denote this field, and let F = F2. There is just one irreducible polynomial of degree 2 
in F[x], namely x? +x+1 (12.4.4), and K is obtained by adjoining a root & of this polynomial 
to F: 

K = F[x]/(Q? +x +1). 

Because the element a, the residue of x, has degree 2, the set (1, a) forms a basis of K over 
F (15.2.7). The elements of K are the four linear combinations of the basis, with coefficients 
modulo 2: 


(15.7.2) K = {0, 1, a, 1+a@}. 


The element 1 + @ is the other root of f(x) in K. Computation in F4 is made using the 
relations 1+1—=Oanda?+a+1=0. 


Try not to confuse the field ¥4 with the ring Z/(4), which isn’t a field. 


Here are the main facts about finite fields: 


Theorem 15.7.3 Let p be a prime integer, and let g = p” be a positive power of p. 


(a) Let K be a field of order g. The elements of K are roots of the polynomial x? — x. 


(b) The irreducible factors of the polynomial x? — x over the prime field F = F, are the 
irreducible polynomials in F [x] whose degrees divide r. 


(c) Let K be a field of order g. The multiplicative group K* of nonzero elements of K is a 
cyclic group of order g — 1. 

(d) There exists a field of order qg. and all fields of order g are isomorphic. 

(e) A field of order p” contains a subfield of order p* if and only if k divides r. 
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Corollary 15.7.4 For every positive integer r, there exists an irreducible polynomial of degree 
r over the prime field Fp. 


Proof. According to (d), there is a field K of order g = p’. Its degree [K: F] over F = Fp is 
r. According to (c), the multiplicative group K% is cyclic. It is obvious that a generator @ for 
this cyclic group will generate K as extension field, ie., that K = F(a). Since [K: F]=r,a 
has degree r over F. So @ is the root of an irreducible polynomial of degree r. O 


We examine a few examples in which q is a power of 2. The irreducible polynomials of 
degree at most 4 over F) are listed in (12.4.4). 


Examples 15.7.5 


(i) The field F4 has degree 2 over IF2. Its elements are the roots of the polynomial 
(15.7.6) xt x =x(x- I(x? +x41), 


Note that the factors of x? — x appear, because FF, contains F. 
Since we are working in characteristic 2, signs are irrelevant: x —1=x +1. 


(ii) The field Fg of order 8 has degree 3 over the prime field 2. Its elements are the eight 
roots of the polynomial x* — x. The factorization of this polynomial in F2 is 


(15.7.7) Bex ax(x-DOP txt DOP 4x7 41). 


The cubic factors are the two irreducible polynomials of degree 3 in F9[x]. 


To compute in the field Fg, we choose an element # in that field, a root of one of 
the irreducible cubic factors, say of x? + x + 1. It will have degree 3 over F. Then 
(1, B, B’) is a basis of Fg as a vector space over F2. The elements of Fs are the eight 
linear combinations of this basis with coefficients 0 and 1: 


(15.7.8) Fg = (0,1, 6, 1+8, 6. 1+6, B+B, 1+ B+ BI. 


Computation in Fg is done using the relations 1 + 1 = 0 and +64+1=0. 
Note that x? + x + 1 is not a factor of x8 — x, and therefore Fg does not contain F4. It 
couldn’t, because [Fg :F2] = 3, [F4:IF2] = 2, and 2 does not divide 3. 

(iii) The field Fy, of order 16 has degree 4 over F2. Its elements are roots of the polynomial 
x!® — x, This polynomial factors in F [x] as 


(15.7.9) x! = x(x—1) (x2 txt I) rt 4342072 444 Dt 4234-1) xt 4x4-1) 


The three irreducible polynomials of degree 4 in F.[x] appear here. The factors of 
x* — x are also among the factors, because Fj¢ contains F4. O 


We now begin the proof of Theorem (15.7.3). We let F denote the prime field F ,. 


Proof of Theorem 15.7.3(a). (the elements of K are roots of x4 — x) Let K bea field of order 
q. The multiplicative group K™ has order g — 1. Therefore the order of any element a of K* 
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divides g — 1, so af@—)) — 1 = 0, which means that @ is a root of the polynomial x(@-) — 1, 
The remaining element of K, zero, is the root of the polynomial x.So every element of K is 
a root of x(x(@-) — 1) = x4 — x. ‘= 


Proof of Theorem 15.7.3(c). (the multiplicative group is cyclic) The proof is based on the 
Structure Theorem 14.7.3 for abelian groups, which tells us that K™ is a direct sum of cyclic 
groups. 

The Structure Theorem was stated with additive notation: A finite abelian group V isa 
direct sum C; ®--- ® Cy of cyclic subgroups of orders d;, ... , dx, such that each dj divides 
the next: d;|d2|---|d,. Let d = dy. If w; is a generator for Cj, then djw; = 0, and since d; 
divides d, dw; = 0. Therefore dv = 0 for every element v of V. The order of every element 
of V divides d. 

Going over to multiplicative notation, K™% is a direct sum of cyclic subgroups, say 
H, ®..-® Hy, where Hj has order d;, and d,|\d2|-+-|d,. With d = d,; as before, the order 
of every element a@ of K™ divides d, which means that a? = 1. Therefore every element of 
K™ is a root of the polynomial x? — 1. This polynomial has at most d roots in K (12.2.20), 
and therefore |K*| = g—1 < d. On the other hand, |K*| = |H, ®---® Hy| =d,...dx. So 
d,...dy =|K*| =q—1<d. Since d = dx, the only possiblility isthatk = 1andg-—1l=d. 
Therefore K* = H;, and K~ is cyclic. Oo 


Proof of Theorem 15.7.3(d). (existence of a field with q elements) Since we have proved (a), 
we know that the elements of a field of order q will be roots of the polynomial x? — x. 
There exists a field extension L of F in which this polynomial splits completely (15.6.3). 
The natural thing to try is to take such a field L and hope for the best, that the roots of 
x? — xin L form the subfield K that we are looking for. This is shown by Lemma 15.7.11 
below. 


Lemma 15.7.10 Let F be a field of prime characteristic p, and let g = p” be a positive power 
of p. 


(a) The polynomial x? — x has no multiple root in any field extension of F. 
(b) Inthe polynomial ring F[x, y], (x+y)? =x?+ y4. 


Proof. (a) The derivative of x? —x is gx@-) — 1, In characteristic p, the coefficient g 
is equal to 0, so the derivative is -1. Since the constant polynomial -1 has no root, x? — x 
and its derivative have no common root, and therefore x? — x has no multiple root (Lemma 
15.6.6). 


(b) We expand (x + y)? in Z[x, y}: 


Lemma 12.4.8 tells us that the binomial coefficients (?) are divisible by p for rin the range 
1 <r<'p. Since F has characteristic p, the map Z[x, y] > F[x, y] sends these coefficients 
to zero, and (x + y)? = xP + yP in F[x, y]. The fact that (x + y)? = x9 + y? when g = p” 
follows by induction. O 
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Lemma 15.7.11 Let p be a prime and let g = p” be a positive power of p. Let L be a field 
of characteristic p, and let K be the set of roots of x? — x in L. Then K is a subfield of L. 


Proof. Let a and Bbe roots of the polynomial x? — x in L. We have to show that a + £, -a, 
a B, | (if w#0), and 1 are roots of the same polynomial. So we assume that w? = a and 
B? = B. The proofs that wf, a~!, and 1 are roots are obvious enough that we omit them. 
Substitution into Lemma 15.7.10(b) shows that (a + 8)? =a?+ BY =a + B. 

Finally, we verify that -1 is a root of x? — x. Since products of roots are roots, it will 
follow that -q@ is a root. If p#2, then q is an odd integer, and it is true that (-1)? = -1. 
If p = 2, q is even, and (-1)? = 1. But in this case, the characteristic of L is 2, so 
1=-lin LZ, Oo 


We must still show that two fields K and K’ of the same order g = p’ are isomorphic. 
Let a be a generator for the cyclic group K*. Then K = F(a), so the irreducible polynomial 
f for a over F has degree equal to [K: F] =r. Then f generates the ideal of polynomials 
in F[x] with root a, and since @ is also a root of x? — x, f divides x? — x. Since x? — x splits 
completely in K’, f has a root a’ in K’ too. Then F(a) and F(a’) are both isomorphic to 
F[x]/(Cf), hence to each other. Counting degrees shows that F(a’) = K’,so K and K’ are 
isomorphic. Oo 


Proof of Theorem 15.7.3(e). (subfields of Fg) Let q = p” and q' = p*. Then 
[Fg : Fp] = r and [Fy : Fp] = k, we can’t have F, C Fy C Fg unless k divides r. Sup- 
pose that k divides r, say r = ks. Substitution of y = p* into the equation y’ — 1 = 
(y — 1)(y5"! +--+» + y+ 1) shows that q’ — 1 divides g — 1. Since the multiplicative group 
K™ iscyclic of order g — 1, and since q’ — 1 divides g — 1, K* contains an element £ of order 
q' — 1. The gq’ — 1 powers of this element are roots of x{7-)) — 1 in K. Therefore x7 — x 
splits completely in K. Lemma 15.7.11 shows that the roots form a field of order q’. O 


Proof of Theorem 15.7.3(b). (the irreducible factors of x? — x) Let g be an irreducible 
polynomial over F of degree k. The polynomial x? — x factors into linear factors in K 
because it has g roots in K. If g divides x? — x, it will also factor into linear factors, so it 
will have a root 6 in K. The degree of B over F divides [K: F] =r, and is equal to k. So k 
divides r. Conversely, suppose that k divides r. Let 6 be a root of g in an extension field of 
F. Then [F(f): F] = k, and by (e), K contains a subfield isomorphic to F(8). Therefore g 
has a root in K, and so g divides x? — x. 


This completes the proof of Theorem 15.7.3. O 


15.8 PRIMITIVE ELEMENTS 


Let K be a field extension of a field F. An element o@ that generates K/ F, i.e., such that 
K = F(q@), is called a primitive elemént for the extension. Primitive elements are useful 
because computation in F(a) can be done easily, provided that the irreducible polynomial 
for a over F is known. : 


Theorem 15.8.1 Primitive Element Theorem. Every finite extension K of a field F of 
characteristic zero contains a primitive element. 
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The statement is true also when F is a finite field, though the proof is different. For an 
infinite field of characteristic p#0, the theorem requires an additional hypothesis. Since we 
won’t be studying such fields, we won’t consider that case. 


Proof of the Primitive Element Theorem. Since the extension K/ F is finite, K is generated 
by a finite set. For example, a basis for K as F-vector space will generate K over F. Say 
that K = F(ay,..., a). We use induction on k. There is nothing to prove when k = 1. For 
k > 1, induction allows us to assume the theorem true for the field Kj = F(a1,..., @x_1) 
generated by the first k — 1 elements a;. So we may assume that K, is generated by a 
single element £. Then K will be generated by the two elements a, and £. The proof of 
the theorem is thereby reduced to the case that K is generated by two elements. The next 
lemma takes care of this case. 0 


Lemma 15.8.2 Let F be a field of characteristic zero, and let K be an extension field that is 
generated over F by two elements q@ and 8. For all but finitely many cin F, y= B+caisa 
primitive element for K over F. 


Proof. Let f(x) and g(x) be the irreducible polynomials for @ and £, respectively, over 
F, and let K be a field extension of K in which f and g split completely. Call their roots 
O1,...,Qm and 61,..., Bn, respectively, with aw = a; and B= fy. 

Since the characteristic is zero, the roots a; are distinct, as are the roots 6; (15.6.8)(b). 
Let yj; = Bj +ca;, withi=1,..., mand j=1,...,n. When (i, JD #CK, £), the equation 
Vij = Ye holds for at most one c. So for all but finitely many elements c of F, the yj; will 
be distinct. We will show that if c avoids these “bad” values, then 4; = 8; + ca; will be a 
primitive element. We drop the subscript, and write y = B, + cay. 

Let L = F(y). To show that y is a primitive element, it will be enough to show that a, 
isin L. Then B, = y — cay will be in L too, and therefore Z will be equal to K. To begin 
with, a, is a root of f(x). The trick is to use g to cook up a second polynomial with a as a 
root, namely h(x) = g(y—cx). This polynomial doesn’t have coefficients in F’, but because 
gisin F[x],cisin F, and y is in L, the coefficients of g are in L. 

Weinspect the greatest common divisor d of f and h. Itis the same, whether computed 
in L[x] or in the extension field K[x] (15.6.4). Since f(x) = (x —a@1)---(x —a@m) in K, d 
is the product of the factors x — a; that also divide h, i.e., those such that a; is a common 
root of # and f. One common root is a;. If we show that this is the only common root, it 
will follow that d = x — a, and because the greatest common divisor is an element of L[x] 
(15.6.4)(d), that a; is an element of L. 

So all we have to do is check that a; is not a root of h when i > 1. We substitute: 
h(ai) = g(y — ca;). The roots of g are f;,..., Bn, So we must check that y — ca;# B; 
for any j, or that 6; + coy #8; + ca;. This is true because c has been chosen so that the 
elements y;; are distinct. O 


15.9 FUNCTION FIELDS 


In this section we look at function fields, the third class of field extensions mentioned at the 
beginning of the chapter. The field C(z) of rational functions in ¢ will be denoted by F. Its 
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elements are fractions p/g of complex polynomials, with g#0. Function fields are finite 
field extensions of F. 

Let @ be a primitive element for an extension field K of F degree n, and let f be the 
irreducible polynomial for @ over F', so that K = F(a) is isomorphic to the field F[x]/(f ), 
with @ corresponding to the residue of x. By clearing denominators, we make f into a 
primitive polynomial that we write as a polynomial in x: 


(15.9.1) f(t, x) = an (@)x" 4+---+ a(x + a(t). 


The hypothesis that f is a primitive polynomial means that the coefficients a;(t) are 
polynomials in t with greatest common divisor 1, and that ay(t) is monic (12.3.9). The 
Riemann surface X of such a polynomial was defined in Section 11.9, as the locus of zeros 
{ f = 0} in complex (t, .x)-space C?. It was shown there that X is an n-sheeted branched 
covering of the complex f-plane 7 (11.9.16). The branch points are the points t = fg of T 
at which the one-variable polynomial f(¢o, x) has fewer than n roots, which happens when 
Ff (to, x) has a multiple root, or when fo is a root of the leading coefficient ay (ft) of f (11.9.17). 

As before, we use the notation X’ for a set obtained by deleting an unspecified finite 
subset from X, and instead of saying that some statement is true except at a finite set of 
points of X, we will say that it is true on X’. 


An isomorphism of extension fields K and L of F was defined in (15.2.9). It is an 
isomorphism of fields g: K — L that restricts to the identity on F: 


(15.9.2) | ees 5 
| | 
F—=—=F 


The vertical arrows in this diagram are the inclusions of F as a subfield into K and L, and 
the long equality symbol stands for the identity map. 


e An isomorphism of branched coverings X and Y of T is a continuous, bijective map 
1: X’ > Y’ that is compatible with the projections of these surfaces to T: 


(15.9.3) xray 
T'——T’. 


The primes indicate that we expect to delete finite sets of points from X and Y in order that 
the map n be defined and bijective. 

Speaking a bit loosely, we call a branched covering 1: X —> T path connected if X' is 
path connected, by which we mean that forevery finite subset A of X, the set X — A is path 
connected. 


The object of this section is to explain the next theorem, which describes function fields 
in terms of their Riemann surfaces. 
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Theorem 15.9.4 Riemann Existence Theorem. There is a bijective correspondence between 
isomorphism classes of function fields of degree n over F and isomorphism classes of con- 
nected, n-sheeted branched coverings of 7, such that the class of the field extension K defined 
by an irreducible polynomial f(t, x) corresponds to the class of its Riemann surface X. 


This theorem gives us a way to decide when two polynomials of the same degree in 
x define isomorphic field extensions. A simple criterion that can often be used is that the 
branch points of their Riemann surfaces must match up. However, the theorem fails to tell 
us how to find a polynomial with a given branched cover as its Riemann surface. It cannot do 
this. Many polynomials define isomorphic field extensions, and finding something is difficult 
when there are many choices. 


The proof of the theorem is too long to include, but one part is rather easy to verify: 


Proposition 15.9.5 Let f(t, x) and g(t, y) be irreducible polynomials in C[t, x] and C[t, y], 
respectively. Let K = F[x]/( f) and L = F[y]/(g) be the field extensions they define, and 
let X and Y be the Riemann surfaces { f = 0} and {g = 0}. lf K/F and L/F are isomorphic 
field extensions, then X and Y are isomorphic branched coverings of T. 


Proof. The residue of y in L = F[y]/(g), let’s call it B, is a root of g, i.e., g(t, 8) = 0, and 
an F-isomorphism g: K > L gives us aroot of gin K, namely y = g !(B). So g(t, Y) = 0. 
As is true for any element of K = F|x]/(f), y can be represented as the residue modulo 
(f) of an element of F[x]. We let uw be such an element, and we define the isomorphism 
niX > Y by n(t, x) = (t, u(t, x)). 

We must show that if (1, x) is a point of X, then (¢, uw) isa point of Y. Since g(t, y) =0 
in K and since u is an element of F[x] that represents y, g(t, u) is in the ideal ( f). There 
is an element h of F'[x] such that 


g(t, u) = fh. 


If (t, x) is a point of X, then f(t, x) = 0, and so g(t, wu) = 0 too. Therefore (¢, u) is indeed 
a point of Y. However, since uw and h are elements of F[x], their coefficients are rational 
functions in f that may have denominators. So 7 may be undefined at a finite set of points. 


The inverse function to 77 is obtained by interchanging the roles of K and L. O 


Cut and Paste 


“Cut and paste” is a procedure to construct or deconstruct a branched covering. 


We go back to our example of the Riemann surface X of the polynomial x* — t, and 
write x = xg + x,i as before. If we cut X open along the double locus of Figure 11.9.15, the 
negative real t-axis, it decomposes into the two parts x9 > 0 and x9 < 0. Each of these parts 
projects bijectively to T, provided that we disregard what happens along the cut. 

Turning this procedure around, we can construct a branched covering isomorphic to 
X in the following way: We stack two copies S;, S2 of the complex plane over T and cut 
them open along the negative real axis. These copies of T will be called sheets. Then we glue 
sideA of the cut on S; to sideB of the cut on S2 and vice versa. (This cannot be done in 
three-dimensional space.) 
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side A 
side B 


(15.9.6) Sides A and B. 


Suppose we are given an n-sheeted branched covering X — T, and let A = 
{pi,..-, Pe} be the set of its branch points in 7. For v = 1,...,k, we choose nonin- 
tersecting half lines C, that lead from p, to infinity. We cut T open along these half lines, 
and we also cut X open at all points that lie over them. 


We should be specific about what we mean by cutting. Let’s agree that cutting 7 open 
means removing all points of the half lines C,, including p,, and that cutting X open means 
removing all points that lie over those half lines. 


Lemma 15.9.7 When X is cut open above the half lines C,, it decomposes as a union of n 
“sheets” S,,..., Sn, which can be numbered arbitrarily. Each sheet projects bijectively to 
the cut plane T. 


This is true because the cut surface X is an unbranched covering space of the cut plane T, 
which is a simply-connected set: Any loop.in the cut plane can be contracted continuously 
to a point. It is intuitively plausible that every unbranched covering of a simply connected 
space decomposes completely. The sheet that contains a point p of X consists of all points 
that can be joined to p by a path without crossing the cuts. (This is an exercise in [Munkres], 
p. 342). O 


Cy 


C 
SSS i 
(15.9.8) The Cut Plane T. 


Now to reconstruct the surface X we take n copies of the cut plane 7, we call them 
“sheets” and label them as Sj,..., S,». We stack them up over T. Except for the cuts, the 
union of these sheets is our branched covering. We must describe the rule for gluing the 
sheets back together along the cuts. On T, we make a loop @, that circles a branch point 
Py in the counterclockwise direction, and we call the side of C, we pass through before 
crossing C’, as “‘side A”’ and the side we pass through after crossing as ‘‘side B.” We label 
the corresponding sides of the sheet S; as side A; and side B;, respectively. Then the rule 
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for gluing X amounts to instructions that side A; is glued to side B; for some j. This rule is 
described by the permutation o, of the indices 1, ... , n that sends i~>j. 

I, # ems clear that we can construct a covering using an arbitrary set of permutations 
Oy, €xtupt that what should happen above the branch points themselves is not clear. To 
avoid ambiguity, we simply delete all branch points and all points that lie over them. 


¢ Branching Data: For v =1,...,r,a permutation o, of the indices 1, ..., n. 
¢ Gluing Instructions: If o,(i) = j, glue side A; to side B; along the cut Cy. 


When the gluing is done no cuts remain, and the union of the sheets is our covering. As is 
true of the Riemann surface depicted in Figure 11.9.15, four dimensions will be needed to 
do the gluing without self crossings. 

If o, is the trivial permutation, then each sheet is glued to itself above Cy. Then that 
cut isn’t needed, and we say that p, is not a true branch point. 

The next corollary restates the above discussion. 


Lemma 15.9.9 Every n-sheeted branched covering X — T is isomorphic to one constructed 
by the cut-and-paste process. D 


Note: The numbering of the sheets is arbitrary, and the concept of a “‘top sheet” has no 
intrinsic meaning for a Riemann surface. If there were a top sheet, one could define x as a 
single valued function of t by choosing the value on that sheet. One can do this only after 
the Riemann surface has been cut open. Wandering around on X leads from one sheet to 
another. O 


Except for the arbitrary numbering of the sheets, the permutations o, are uniquely 
determined by the branched covering X. A change of numbering by a permutation ¢ will 
change each oy to the conjugate p ‘ovp. 


Lemma 15.9.10 Let X and Y be branched coverings constructed by cut and paste, using the 
same points py and half lines Cy. Let the permutations defining their gluing data be o, and 
Ty, respectively. Then X and Y are isomorphic branched coverings if and only if there is a 
permutation p such that t, = p!oyp for each v. a) 


Lemma 15.9.11 The branched covering X constructed by cut and paste is path connected if 
and only ifthe permutations 0), ...,0, generate a subgroup H of the symmetric group that 
operates transitively on the indices 1, ...,n. 


Proof. Each sheet is path connected. If the permutation o,, sends the index i to j, the sheets 
S; and S; are glued together along the cut C,. Then there will be a short path across the cut 
that leads from a point of S; to a point of S;, and because the sheets themselves are path 
connected, all points of S; U S; can be connected by paths. So X is path connected if and 
only if, for every pair of indices i, j, there is a sequence of the permutations o, that carries 
i=ig»i, ~ --- ~ ig = j. This will be true if and only if A operates transitively. D 
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Example 15.9.12 The simplest k-sheeted path connected branched coverings of T are 
branched at a single point. Let Y be such a covering, branched only at the origin ¢ = 0. 
The branching data for Y consists of a single permutation o, the one that corresponds to a 
loop around the origin. The previous lemma tells us that, since Y is path connected, o must 
operate transitively on the k indices, and the only permutations that operate transitively are 
the cyclic permutations of order k. So with suitable numbering of the sheets, 0 = (12 --- k). 
There is, up to isomorphism, exactly one k-sheeted branched covering branched only at the 
origin. The Riemann Existence Theorem tells us that there is, up to isomorphism, a unique 
field extension with this Riemann surface. It is not hard to guess this field extension: it is the 
one defined by the polynomial y* — t, i.e. K = F(y), where y = Yt. The Riemann surface 
Y has k sheets. It is branched only at the origin because each ¢ different from zero has k 
complex kth roots. 

There are two more things to be said here. First, the theorem asserts that this is the only 
field extension of degree k branched at the single point t = 0. This isn’t obvious. Second, 
the same field extension K = F(y) can be generated by many elements. For most choices of 
generators, it would not be obvious that there is only one true branch point. O 


Computing the Permutations 


Given a polynomial /(¢, x), one wishes to determine the permutations o, that define the 
gluing data of its Riemann surface. Two problems present themselves. First, the “‘local 
problem:” At each branch point p one must determine the permutation o of the sheets that 
occurs when one circles that point. As we have seen, o depends on the numbering of the 
sheets. Second, one must take care to use the same numbering for each branch point. This 
is the more difficult problem. A computer has no problem with it, but except in very simple 
cases, it is difficult to do by hand. 


To compute the permutations, the computer chooses a “‘base point” b in the cut plane 
T and computes the n roots of the polynomial f(b, x) numerically, with a suitable accuracy. 
It numbers these roots arbitrarily, say 1,..., Ym, and labels the sheets by calling S; the 
sheet that contains the root y;. Then it walks to a point b, in the vicinity of a branch point 
Py, taking care not to cross any of the cuts. The roots y; vary continuously, and the computer 
can follow this variation by recomputing roots every time it takes a small step. This tells it 
how to label the sheets at the point b,. Then to determine the permutation o,, the computer 
follows a counterclockwise loop £, around py, again recomputing roots as it goes along. 
Because the loop crosses the cut Cy, the roots will have been permuted by o,, when the path 
returns to b,. In this way, the computer determines o,. And because the numbering has 
been established at the base point b, it will be.the same for all of the branch points. 

Needless to say, doing this by hand is incredibly tedious. We find ways to get around 
the problem in the examples we present below. 


The local problem can be solved by analytic methods, and we give an incomplete 
analysis here. The method is to relate the Riemann surface to one that we know, namely to 
the Riemann surface Y of the polynomial y* — t. Let to be a branch point of the Riemann 
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surface X: { f(t, x) = 0}, where f is a polynomial of the form (15.9.1). Substituting ¢ = fo, 
we obtain the one-variable polynomial f°(x) = f(to, x). 


Lemma 15.9.13 Let xo bea root of f(x). Suppose that 


* xo isa k-fold root of f(x), and 
e the partial derivative af is not zero at the point (tg, xo). 


Then the permutation of the sheets at the point fo contains a k-cycle. 


Proof. We change variables to move the point (to, xo) to the origin (0, 0), so that f°(x) = 
F(O, x), and we write ft, x) = f(x) — tu(t, x). Then £0, 0) = -v(0, 0). Our hypotheses 
tell us that v(0, 0) #0. Also, since x = 0 is a k-fold root of f(x), that polynomial has the 
form x*u(x) where u(x) isa polynomial in x and u(0) #0. Then f(z, x) = xku (x)-tvu(t, x). 
Let c = u(0)/v(0, 0). We replace t by c’!t. The result is that now u(0)/v(0, 0) = 1. 

We restrict attention to a small neighborhood U of the origin (0, 0) in (t, x)-space, and write 
the equation f = Oas 

xKu/v=t. 


For (t, x) in U, u/v is near to 1. Among the kth roots of w/v, one will be near to 1, and that 
root, call it w, depends continuously on the point (¢, x) in U. The other kth roots will be 
c’w, where € = e27*/k, 

Let y = xw. Then in our neighborhood U, the equation f(t, x) = 0 is equivalent with 
yk = t. Therefore there are k sheets of our Riemann surface X that intersect U, and when 
we make a loop around the point t = 0, those k sheets will be permuted in the same way as 
the sheets of the Riemann surface Y, i.e., cyclically. O 


We now describe the branching data for a few simple polynomials. We take polynomials 
that are monic in x. The branch points will be the points to at which f(¢o, x) has multiple 
roots — the points at which f(¢o, x) and Sto, x) have a common root. Proposition 15.9.13 
will be our main tool. 


Examples 15.9.14 (a) f(t, x) = x? —P +1, af =2%; af = -372 +1. 


Here X is a two-sheeted covering of 7. There are three branch points t¢ = 0, ¢t = 1, 
and ¢ = -1, and af # Oat all of them. So the permutation of the sheets at each of these points 
contains a two-cycle. Since there are two sheets, each of the permutations is the transposition 
(12). We don’t need to be careful about the numbering when there are two sheets. 


(b) We ask for a path connected, three-sheeted branched covering X of T branched at two 
points p; and po, and such that the permutation o; at the point p; is a transposition. 

We may label the sheets so that 0, = (12). Then because X is path connected, 
the permutation o7 must be either (23) or (13) (15.9.11). Switching the sheets called S$; 
and S doesn’t affect o;, but it interchanges the two other transpositions, so with suitable 


470 Chapter 15 Fields 


numbering of the sheets, 0; = (12) and o2 = (23). There is just one isomorphism class of 
such coverings. 

The Riemann Existence theorem tells us that there is, up to isomorphism, a unique 
field extension K of F with this covering as its Riemann surface. Of course K will depend 
on the location of the two branch points but they can be moved to any position by a linear 
change of variable in ¢. 

How do we find a polynomial f(t, x) whose Riemann surface has this form? There is 
no general method, so one has to guess, and this case is simple enough that it can be guessed 
fairly easily. Since there is very minimal branching, we look for a very simple polynomial 
that is cubic in x. It takes a bit of courage to start looking, but one of the first attempts might 
be a polynomial of the form x3 +x +t. This will work, but let’s take f(t, x) = x? —3x+t 
instead. Then af = 3x* —3 and of = 1. Substituting the roots x = +1 of af into f, one finds 
that the branch points are the points tf = +2. Since af is nowhere zero, Proposition 15.9.13 
applies. 

There is a double root at the point p; = (2, -1). So a; contains 2-cycle, a transposition. 
Similarly, a2 is a transposition. So apart from the location of the two branch points, the 
Riemann surface X of the polynomial f = x* — 3x +¢ has the desired properties, and 
F{x]/Cf ) defines the field extension with that branching. 


(c) ft, = —-P4P, L23x?, La-3P +0. 


Here X is a three-sheeted covering of 7. The branch points are at t = 0 and ¢ = 1, and 
both f(0, x) and f(1, x) have triple roots. Let og and o; denote the permutations of the 
af: 


sheets at the branch points. The partial derivative =- is not zero at f = 1, so the three sheets 


are permuted cyclically there. With suitable numbering, o; will be (123). 

The point tf = 0 presents problems. First, af vanishes there. Second, how can we make 
sure to use the same numbering of the sheets at the two points? In the previous example, 
knowing that the Riemann surface must be path connected was enough to determine the 
branching. This fact gives us no information here because o; operates transitively on the 
sheets by itself. 

We use atrick that works only in the simplest cases. That is to compute the permutation 
that we get by walking around a large circle I’. A large circular path will cross each of the 
cuts once (see Figure 15.9.8), so the sheets will be permuted by the product permutation 
0901, or by 0;009, depending on where we start. If we can determine that permutation, then 
since we know oj, we will be able to recover oo. 

The substitution t = u—! maps T bijectively to the complex u-plane U, except that it 
is undefined at the points t = 0 and u = 0. Because u — O as t — oo, the point u = Oof U is 
called the point at infinity of T. Our large circle [ in T corresponds to a small circle, we’ll call 
it L, that circles the origin in U. However, a counterclockwise walk around I’ corresponds 
to a clockwise walk areund L: If t = re’?, then u = r~!e7¥9, 

We make the substitution t = u—! into the polynomial f = x° — +? and clear 
denominators, obtaining x°>u? — 1+. When analyzing such a substitution, one usually has 
to substitute for x as well. It seems clear here that we should set y = ux. This gives us 


y-ltu. 
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Let’s call this polynomial g(u, y). The Riemann surfaces X and Y : {g = 0} correspond via 
the substitution (x, tf) @ (y, u), which is defined and invertible except above the origins in 
the planes T and U. Therefore the permutation of sheets of X defined by a counterclockwise 
walk around I will be the same as the permutation of sheets of Y defined by a clockwise 
walk around L. That permutation is trivial, because the Riemann surface Y is not branched 
atu = 0. Therefore ogo; = 1, and since o, = (123), 09 = (321). O 


15.10 THE FUNDAMENTAL THEOREM OF ALGEBRA 


A field F is algebraically closed if every polynomial of positive degree with coefficients in 
F has a root in F. The Fundamental Theorem of Algebra asserts that the field of complex 
numbers is algebraically closed. 


Theorem 15.10.1 Fundamental Theorem of Algebra. Every nonconstant polynomial with 
complex coefficients has a complex root. 


There are several proofs of this theorem, and one of them is particularly appealing. 
We present it in outline. We must prove that a nonconstant polynomial 


(15.10.2) f(x) =x" +an_yx” |!) +---+a,x +a 


with complex coefficients has a complex root. If ag = 0, then 0 is a root, so we may assume 


that apg #0. 

The rule y = f(x) defines a function from the complex x-plane to the complex y-plane. 
Let C; denote a circle of radius r about the origin in the complex x-plane, parametrized as 
x = re'®, with 0 < 6 < 2m. We inspect the image {(C,) of C,. 

To warm up, we consider the function defined by the polynomial y = x” = r*e™9. As 
8 runs from 0 to 27, the point x travels once around the circle of radius r. At the same time, 
nO runs from 0 to 271n. The point y winds n times around the circle of radius r”. 

Now let f be the polynomial (15.10.2). For sufficiently large r, x” is the dominant term 
in f(x). To make this precise, let w be the maximum absolute value of the coefficients a; of 
f. Then if |x| = r > 10n™, 


| fc) — x"| = |anix" 7! +--+ +.a,x +a9| <nmMixi""! < dr’. 


It follows from this inequality that, as @ runs from 0 to 27 and x” winds n times around 
the circle of radius r”, f(x) also winds around the origin n times. A good way to visualize 
this conclusion is with the dog-on-a-leash model. If someone walks a dog n times around a 
large circular path, the dog also goes around n times, though perhaps following a different 
path. This will be true provided that the leash is shorter than the radius of the path. Here x” 
represents the position of the person at the time 0, and f(x) represents the position of the 
dog. The radius of the path is r” and the length of the leash is is?”- 

We vary the radius r. Since f is a continuous function, the image f(C;) will vary 
continuously with r. When the radius 7 is very small, f(C;) makes a small loop around the 
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constant term ap of f. This small loop won’t wind around the origin at all. But as we just 
saw, f(C;) winds n times around the origin if r is large enough. The only explanation for 
this is that for some intermediate radius 7’, f(C,) passes through the origin. This means 
that for some point @ on the circle C,, f(~) = 0. Then @ is a root of f. 


| don’t consider this algebra, 
but this doesn’t mean that algebraists can‘t do it. 


—Garrett Birkhoff 


EXERCISES 


Section 1 Examples of Fields 
1.1. Let R bean integral domain that contains a field Fas subring and that is finite-dimensional 
when viewed as vector space over F’.. Prove that R is a field. 


1.2. Let F be a field, not of characteristic 2, and let x* + bx + c = 0 be a quadratic equation 
with coefficients in F. Prove that if 5 is an element of F such that 6? = b* — 4c, 
x = (-b + 4)/2a solves the quadratic equation in F. Prove also that if the discriminant 
b — 4c is not a square, the polynomial has no root in F. 


1.3. Which subfields of C are dense subsets of C? 


Section 2 Algebraic and Transcendental Elements 


2.1. Let a be a complex root of the polynomial x? — 3x + 4. Find the inverse of a +a@+1in 
the form a + ba + ca’, with a, b, c in Q. 


2.2. Let f(x) = x” — ay_yx""! + ---+ap be an irreducible polynomial over F, and let a be 
a root of f in an extension field K. Determine the element a! explicitly in terms of w 
and of the coefficients a;. 


2.3. Let B = w/2,wherew = e?'/3 andlet K = Q(8). Prove that the equation x}+-- +x} = 
~1 has no solution with x; in K. 


Section3 The Degree of a Field Extension 


3.1. Let F be a field, and let a be an element that generates a field extension of F of degree 5. 
Prove that a” generates the same extension. 

3.2. Prove that the polynomial x* + 3x + 3 is irreducible over the field QJ 12). 

3.3. Let 6, = e27'/", Prove that fs ¢ Q(é7). . 

3.4. Let C, = e?™/". Determine the irreducible polynomial over Q and over Q(¢3) of 
(a) 4, (b) Se, (©) Sg, (d) So, (©) S10, (£) S12. 


3.5. Determine the values of n such that ¢,, has degree at most 3 over Q. 


3.6. 


3.7, 


3.8. 


3.9. 


3.10. 
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Let a be a positive rational number that is not a square in Q. Prove that </a has degree 4 
over Q. 

(a) Isi in the field Q(/-2)? (b) Is V5 in the field Q(</2)? 

Let @ and B be complex numbers. Prove that if a + 6 and af are algebraic numbers, 
then a and @ are also algebraic numbers. 

Let a and B be complex roots of irreducible polynomials f(x) and g(x) in Q[x]. Let 
K = Q(q@) and L = Q(£). Prove that f(x) is irreducible in L[x] if and only if g(x) is 
irreducible in K[x]. 

A field extension K/F is an algebraic extension if every element of K is algebraic 


over F. Let K/F and L/K be algebraic field extensions. Prove that L/F is an algebraic 
extension. 


Section 4 Finding the Irreducible Polynomial 


41, 


4.2. 


4.3. 


Let K = Q(qa), where a is a root of x3 — x — 1. Determine the irreducible polynomial for 
y=1+ 7 over Q. 

Determine the irreducible polynomial for a = /3 + J/5 over the following fields. 
(a) Q, (b) Q(V5), (©) Q(V10), d) Q(V15). 

With reference to Example 15.4.4(b), determine the irreducible polynomial for y = 
ay + a2 over Q. 


Section 5 Constructions with Ruler and Compass 


5.1. 
5.2. 


5.3. 
5.4. 
5.5. 


5.6. 


Express cos 15° in terms of real square roots. 

Prove that the regular pentagon can be constructed by ruler and compass 

(a) by field theory, (b) by finding an explicit construction. 

Decide whether or not the regular 9-gon is constructible by ruler and compass. 

Is it possible to construct a square whose area is equal to that of a given triangle? 


Referring to the proof of Proposition 15.5.5, suppose that the discriminant D is negative. 
Determine the line that appears at the end of the proof geometrically. 

Thinking of the plane as the complex plane, describe the set of constructible points as 
complex numbers. 


Section6 Adjoining Roots 


6.1. 


6.2. 


Let F be a field of characteristic zero, let f’ denote the derivative of a polynomial! f in 

F [x], and let g be an irreducible polynomial that is a common divisor of f and f’. Prove 

that g* divides f. 

(a) Let F be a field of characteristic zero. Determine all square roots of elements of F 
that a quadratic extension of the form F(./a) contains. 


(b) Classify quadratic extensions of Q. 
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6.3. 


Determine the quadratic number fields Q[/d] that contain a primitive nth root of unity, 
for some integer 7. 


Section 7 Finite Fields 


7.1. 
7.2. 
7.3. 
7.4. 
7.5. 
7.6. 
7.7. 
78. 


79. 


*7.10. 


7.13. 
7.14. 


Identify the group Fj. 

Determine the irreducible polynomial of each of the elements of Fs in the list 15.7.8 
Find a 13th root of 2 in the field F43. 

Determine the number of irreducible polynomials of degree 3 over F3 and over Fs. 
Factor x? — x and x2? — xin F3. 

Factor the polynomial x!6 — x over the fields Fy and Fs. 

Let K be a finite field. Prove that the product of the nonzero elements of K is -1. 


The polynomials f(x) = x3 + x +1 and g(x) = x3 + x? +1 are irreducible over F . Let 
K be the field extension obtained by adjoining a root of f, and let L be the extension 
obtained by adjoining a root of g. Describe explicitly an isomorphism from K to L, and 
determine the number of such isomorphisms. 


Work this problem without appealing to Theorem (15.7.3). Let F = Fp. 


(a) Determine the number of monic irreducible polynomials of degree 2 in F[x]. 

(b) Let f(x) be an irreducible polynomial of degree 2 in F[x]. Prove that K = F[x]/(f) 
is a field of order p’, and that its elements have the form a + ba, where a and b are 
in F and a is aroot of f in K. Moreover, every such element with b#0 is the root 
of an irreducible quadratic polynomial in F[x]. 

(c) Show that every polynomial of degree 2 in F[x] has a root in K. 

(d) Show that all the fields K constructed as above for a given prime p are isomorphic. 


Let F be a finite field, and let f(x) be a nonconstant polynomial whose derivative is the 
zero polynomial. Prove that f cannot be irreducible over F. 


. Let f = ax’ + bx +c with a, b, cin aring R. Show that the ideal of the polynomial ring 


R[x] that is generated by f and /’ contains the discriminant, the constant polynomial 
b? — 4ac. 


. Let p be a prime integer, and let g = p” and q’ = p*. For which values of r and k does 


x% — x divide x? — x in Z[x]? 
Prove that a finite subgroup ofthe multiplicative group of any field F is a cyclic group. 


Find a formula in terms of the Euler ¢ function for the number of irreducible polynomials 
of degree n over the field Fp. 


Section 8 Primitive Elements 


8.1. 
8.2. 


Prove that every finite extension of a finite field has a primitive element. 


Determine all primitive elements for the extension K = Q(v2, V3) of Q. 
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Section9 Function Fields 

9.1. Let f(x) be a polynomial with coefficients in a field F. Prove that if there is a rational 
function r(x) such that 72 = f, then r is a polynomial. 

9.2. Determine the branch points and the gluing data for the Riemann surfaces of the 
following polynomials. 
(a)x? — 2° 4:1, (b) x4 -—t-1, (0) x3 —3tx- 42, (d) x3 — 3x? - 18, 
(e) x? —t(t— 1), (f£) x9 — 3tx2 +1, (gy x44+4x424, (h) x3 -3rxr-1-F. 

9.3. (a) Determine the number of isomorphism classes of function fields K of degree 3 over 

F = C(f) that are ramified only at the points 1 and -1. 


(b) Describe gluing data for the Riemann surface corresponding to each isomorphism 
class of fields as a pair of permutations. 


(c) For each isomorphism class, find a polynomial f(t, x) such that K = F[t]/(f) 
represents the isomorphism class. 
*9.4, Prove the Riemann Existence Theorem for quadratic extensions. 
Hint; Show that up to isomorphism, a quadratic extension of F is described by ihe finite 
set {p1,..., px} of its true branch points. 


*9,5, Write a computer program that determines the branch points p, and the permutations 
oy for the Riemann surface of a given polynomial. 
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10.1. Prove that the subset of C consisting of the algebraic numbers is algebraically closed. 
10.2. Construct an algebraically closed field that contains the prime field Fp. 


*10.3. With notation as at the end of the section, a comparison of the images f(C,) for varying 
radii shows another interesting geometric feature: For large r, the curve f(C,) makes n 
loops around the origin. Its total curvature is 277m. Assuming that the coefficient a; is not 
zero, the linear term a1z + Gg dominates f(z) forsmall z. Thenforsmall r, f(C;) makes 
a single loop around ap. Its total curvature is only 277. Something happens to the loops as 
r varies. Explain. 


*10.4. Write a computer program to illustrate the variation of f(C,) with r. 


Miscellaneous Exercises 
M.1. Let K = F(q) be a field extension generated by a transcendental element a, and let 6 
be an element of K that is not in F. Prove that @ is algebraic over the field F(B). 
M.2. Factor x’ + x +1 in F7[x]. 


*M.3. Let f(x) be an irreducible polynomial of degree 6 over a field F, and let K be a quadratic 
extension of F. What can be said about the degrees of the irreducible factors of f in 
K[x]? 
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M.4. 


*M.S5. 


*M.6. 


*M.7. 


(a) Let p be anodd prime. Prove that exactly half of the elements of ies are squares and 
that if a and B are nonsquares, then af is a square. 

(b) Prove the same assertion for any finite field of odd order. 

(c) Prove that in a finite field of even order, every element is a square. 

(d) Prove that the irreducible polynomial for y = /2 + V3 over Q is reducible modulo 
p forevery prime p. 

Prove that any element of G L2(Z) of finite order has order 1, 2, 3, 4, or 6 

(a) by using field theory. 

(b) by applying the Crystallographic Restriction. 


(a) Prove that a rational function f(¢) that generates the field C(#) of all rational 
functions defines a bijective map 7’ > T’. 


(b) Prove a rational function f(x) generates the field of rational functions C(x) if and 
only if it is of the form (ax + b)/(cx + d), with ad — bc #0. 


(c) Identify the group of automorphisms of C(x) that are the identity on C. 


Prove that the homomorphism SL2(Z) > SL2(Fp) obtained by reducing the matrix 
entries modulo p is surjective. 


CHAPTER 16 


Galois Theory 


En un mot les calculs sont impraticables. 


—Evariste Galois 


We have seen that computation in an extension field generated by a single algebraic element 
a can be made simply, by identifying it with the formally constructed field F[x]/(f/, 
where f is the irreducible polynomomial for @ over F. But suppose that f factors into 
linear factors in an extension field K. It isn’t clear how to compute with all of the roots 
at the same time. To do that we need to know how they are related, and that depends 
on the particular case. The fundamental discovery that arose through the work of several 
people, especially of Lagrange and Galois, is that the relationships between the roots 
are best understood indirectly, in terms of symmetry. That symmetry is the topic of this 
chapter. 

Beginning in Section 16.4, we assume that the fields we are working with have 
characteristic zero. The most important consequences of this assumption are: 


¢ The roots of an irreducible polynomial over a field F are distinct (15.6.8). 
e A finite extension field K/F has a primitive element (15.8.1). 


16.1 SYMMETRIC FUNCTIONS 


Let R[u] denote the polynomial ring R[u1,...,u4,] in n variables over a ring R. 
A permutation o of the indices {1,...,m} operates on polynomials by permuting the 
variables: 

(16.1.1) f = flui,..-,Un)~» fluo, --+,Uon) = o(f). 


In this way, o defines an automorphism of R[u] that we denote by o too. Because o acts 
as the identity on the constant polynomials, it is called an R-automorphism. The symmetric 
group Sp, operates by R-automorphisms on the polynomial ring. A symmetric polynomial 
is one that is fixed by every permutation. The symmetric polynomials form a subring of the 
polynomial ring R[u]. 

A polynomial g is symmetric if two monomials that are in the same orbit, such as uyus 
and u2uUs, have the same coefficient in g. We call the sum of the monomials in an orbit an 
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orbit sum. The orbit sums form a basis for the space of symmetric polynomials. The orbit 
sums of degree at most 3 in three variables are 


1, upt+ut+u;, ue us as , byez + Uzu3 + u2U3, 
34343 2 uu? + uju2 +u3u2 + unu2 +uzu2, upuzU 
Uy +tuz+uyz, Uyuy +2; 143 3uUy + U2; au, , UyU2U3. 


The elementary symmetric functions are some special symmetric polynomials. When 
there are n variables, they are 


n= Sou = Uy big tess +p 
i 
= Youu; = Uju24+ Uyu3z4+--- 
i<j 
$3= > UjUjUR = Uyu2u3+-°: 
i<j<k 
Sph= Ujyl2:--Uy = UjU2-: Uy. 


Indices have been chosen so that s; is the orbit sum of the monomial u,u2---u;. The 
elementary symmetric functions in three variables are shown above in boldface. 

The elementary symmetric functions are the coefficients of the polynomial with variable 
roots 41,..., Un: 


P(x) =(x — uy) (xX — 42) (XK —U 
(16.1.2) (x) =( rox 2) n) 
=x — Sy xP" + S9xP* — - ESy. 
Whenn = 2, 


P(x) = (x — 41) (x — U2) = x? — (uy + U2) x + (Wyte), 
and when n = 3, 
P(x) = x (uy + U2 + U3) x2 + (upun + U1 U3 + 23) x — (Uyu2U3). 


The order of the indices in (16.1.2) is the reverse of the one we have used for the coefficients 
of a polynomial previously, and the signs alternate. Because of the way these indices and 
signs appear, we will label undetermined coefficients of a polynomial in the analogous form 
in this chapter: 


(16.1.3) fx) = x = ayx"! 4 age? —..- tay. 


As before, we say that a polynomial f splits completely in a field K if it factors into 
linear factors, say 


(16.1.4) f(x) = (%— a) ++ (x - Gn), 


with a; in K.Ifso, then substituting u; = a; shows that the coefficients of f are obtained by 
evaluating the symmetric functions. 
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Lemma 16.1.5 If (16.1.4) is a factorization of the polynomial (16.1.3), then a; = 
Sj(Q1, ..., Qn). oO 


Theorem 16.1.6 Symmetric Functions Theorem. Every symmetric polynomial g(u1, ..., Un) 
with coefficients in a ring R can be written in a unique way as a polynomial in the elementary 
symmetric functions 51, ..., Sy. 


To be precise: If g(u) is a symmetric polynomial, there is a unique polynomial G(z1, ..., Zn) 
with coefficients in R in another set of variables z,,..., Z,, such that g(u) is obtained by 
the substitution z; ~ sj: g(u1, ..., Un) = G(S1,..., §n). 


We prove the theorem below, but first, some examples: 


Examples 16.1.7 (a) The symmetric polynomial ut +..-+u2, because it has degree 2, 
is a linear combination cis} + €2S2. One can use special values of the variables to determine 
the coefficients. Substituting u = (1,0,...,0) shows that c; = 1, and substituting u = 
(1, -1, 0,..., 0) shows that cz = -2: 


(16.1.8) ur +++ +u2 = 52 —2sp. 
(b) We use a different method for the symmetric polynomial 
(16.1.9) g(u) = uyus + uu? + uyuy + u3uy + uns + u3u3 


in the three variables u1, “2, u3. The first step is to set u3 = 0. We obtain the symmetric 
polynomial g° = usu2 + usuy in the remaining variables. Let s? denote the elementary 
symmetric functions in 41, 42: Ss? = 4; + U2 and s5 = u, 42. We notice that g° = SUS5. 
The second step is to compare the polynomial g with the three-variable symmetric 
polynomial 5159: 
S1Sq = (Uy + U2 + 43) (U2 + 443 + 4243). 


We won’t expand the right side explicitly. Instead, we note that the expansion has nine terms, 
one of which is usu. Since 5,52 is symmetric, the orbit sum g of uiu?, which has six 
terms, appears. The three remaining terms are equal to u1u2u3 = 83: 


(16.1.10) 2 = 8182 — 353. 


This computation is an example of a systematic method, and the proof of the Symmetric 
Functions Theorem, which we explain next, is based on that method. O 


Proof of the Symmetric Functions Theorem. There is nothing to show when n = 1, because 
uy = Ss; in that case. Proceeding by induction, we assume the theorem proved for symmetric 


functions in n — 1 variables. Given a symmetric polynomial g in uj,..., u,, we consider 
the polynomial g° obtained by substituting zero for the last variable: g°(u1,...,Un—1) = 
g(u,...,Un_1,0). We note that g° is symmetric in 41,...,%,-1. So by the induction 


hypothesis, g° may be written as a polynomial in the elementary symmetric functions in 
Uy,...,Un—1, Which we label as s},...,S)_4: 


Sp =U, +42 +++++Un-1, ete. 
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There is a symmetric polynomial Q(z), ..., Z,—1) such that g° = O(S i aney Sosy): 

Lemma 16.1.11 Let g be a symmetric polynomial of degree d in the variables u1,..., Un, 
and suppose that g° = Q(s},...,57_,). Then g = Q(5),...,5n-1) + Snh, where h is a 
symmetric polynomial in u1,..., un of degree d —n. 


Proof. Let p(41,...,Un) = g(U1,---,Un) — O(S1,..-, Sp_1). This is a difference of sym- 
metric polynomials, so it is symmetric, and if we set un = 0, we obtain p(u1,..., U,_1,0) = 
g° — O(s°) = 0. Therefore uy, divides p. Because p is symmetric, every u; divides p, and 
therefore s, divides p. Writing p = s,h, the polynomial h is symmetric. This gives us an 
equation of the form claimed by the lemma. O 


We go back to the proof of the Symmetric Functions Theorem. The lemma tells us that 
2 = QO(s) + Snh, where h is symmetric. A second induction, this time on the degree of 
a-symmetric polynomial, allows us to conclude that A is a polynomial in the symmetric 
functions. Then so is g. 


One can show that G is uniquely determined by going over this proof. O 
We give one more example of the systematic method. Let g be the orbit sum of 
the monomial U,Us, but this time in four variables u1,..., 44. Let s1,..., 84 denote the 


elementary symmetric functions in four variables. We set ug = 0, and obtain formula 
(16.1.10), written now as g° = s{s3 — 3s3. Then as in the above lemma, 


2 = $152 — 383 + sgh. 


Since g has degree 3, A = 0. Formula 16.1.10 remains valid when g is the orbit sum of utu2 
in any number n > 3 of variables. 

Here is an important consequence of the Symmetric Functions Theorem: 
Corollary 16.1.12 Suppose that a polynomial f(x) = x” —a,x""!+.--+ay has coefficients 
in a field F, and that it splits completely in an extension field K, with roots aj,...,Q@p. 
Let g(u1,...,Un) be a symmetric polynomial in w1,..., un with coefficients in F. Then 
g(@,..., @,) is an element of F. 


For instance, atk + otk +-+-+ ack will be an element of F. 


Proof. The Symmetric Functions Theorem tells us that g is a polynomial in the elementary 
symmetric functions. Say that g(u,,...,Un) = G(S1,..., Sn), where G(z) is a polynomial 
with coefficients in F. When we evaluate at u = a, we obtain s;(a@) = a; (16.1.5). So 


(16.1.13) glay,...,Q@,) = G(a1,..., an). 
Because @1,..., @, are in F and G has coefficients in F, G(a) is in F. 0 


The next proposition provides a way to construct symmetric polynomials, starting with 
any polynomial. 
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Proposition 16.1.14 Let p; = p,(™1,...,U4n) be a polynomial, let {p,,..., pz} be its 
orbit for the operation of the symmetric group on the variables, and let w = wy1,..., wx 
be another set of variables, where k is the number:of polynomials in the orbit of p;. (So k 
divides the order n! of the symmetric group.) If h(w , ..., wz) is a symmetric polynomial 
in w, then h(p;,..., py) is asymmetric polynomial in uw. 


Proof. Except that it is slightly confusing, this is nearly trivial. A permutation of the variables 
Uy,.-.,Un permutes the set {p,,..., py} because that set is an orbit. And because h is a 
symmetric polynomial, a permutation of p,,..., px carries h(p),..., px) to itself. OD 


Example 16.1.15 There are three variables u,, 42, v3 and p; = ue + u2uU3. The orbit of p; 
consists of three polynomials: 


2, 2 
Pi =Uy +4243, Pp2=UZ+ U3u4, p3 = 44+ uju. 
We substitute w = p into the symmetric polynomial w;w2 + w,w3 + w2w3, obtaining a 
symmetric polynomial in u: 
3 terms 6 terms 3 terms 


D1 P2 + prp3t pspr = (ujus +o) + (ujus tes) + (uyugud +e). 0 


16.2 THE DISCRIMINANT 


The most important symmetric polynomial, aside from the elementary symmetric functions, 
is the discriminant of the polynomial 


P(x) = x" ~ s)x"-1 4 Sy a sy, 
with the variable roots u1,..., Un. By definition, the discriminant is 
(16.2.1) D(u) = (ut = 42)? (uy = 43)? (tn — Un)? = [ ] (ui - 4)’. 


i<j 
Its main properties are: 


e D(u) is asymmetric polynomial with integer coefficients. 
e If a,...,@, are elements of a field, then D(qw) = 0 if and only if two of the 
elements a; are equal. 


The Symmetric Functions Theorem tells us that the discriminant D can be written 
uniquely as an integer polynomial in the elementary symmetric functions. Let 


(16.2.2) A(z) = A(Zi,.-.5 Zn) 
be that polynomial, so that D(u) = A(s). Whenn = 2, 
(16.2.3) D=(u,—u2) =517—452, and A(z) = z{ - 422. 


This is the familiar formula for the discriminant of the quadratic polynomial x? — s,x + 52, 
though the fact that D is the square of the difference of the roots wasn’t emphasized when I 
was in school. 
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Unfortunately, D and A are very complicated when n is larger. I don’t know what they 
are when n > 3. The discriminant of the general cubic polynomial 


(16.2.4) P(x) = 0 — 54x? + 9x — 53 
is already too complicated to remember: 


D = (uy — u2)* (uy — U3)? (uz — 43)? 


(16.2.5) 
= -45753 + sist + 185157283 — 453 — 275%, 


A =-42323 + 2723 + 18212223 — 423 — 2723. 


These formulas remain true when substitutions are made for the variables u;. If we are 


given particular elements a@1,...,@, in aring R, and if 
(x — a1) (x — 2) +++ (X — hn) = x" — ax"! + anx-? —--- +n, 
then, substituting a; for uj, 
D(a, ...,n) = [ [@i —aj)? = A(a,,...,an). 
i<j 
Whether or not a polynomial f(x) = x” — ayx"~! + ajx"~? —---+a, isa product of linear 
factors, its discriminant is defined to be the element A(qaj,...,@,), where A(z) is the 


polynomial (16.2.2). If f has coefficients in a field F, then A(z) has coefficients in F and 
A(a) is an element of F. 

The discriminant of a cubic becomes simpler when the coefficient of x? in f(x) is zero. 
Provided that the characteristic is not 3, the quadratic term in the general polynomial (16.2.4) 
can be eliminated by a substitution analogous to completing squares, called a Tschirnhausen 
trans formation, 


(16.2.6) x=yt+s1/3. 

If we write a cubic whose quadratic term vanishes as 
(16.2.7) f(x) =x? + px+q, 
the discriminant is obtained by substituting into (16.2.5): 
(16.2.8) A(O, p,-g) =-4p° —27q’. 


Since the elementary symmetric function s; has degree i in the variables u, it is 
convenient to assign the weight i to the variable z;, and to define the weighted degree of a 


monomial bay .++Zq" to be €; +2€2+---+ne,. Substitution of s; for z; into a monomial of 
weighted degree d in z yields a polynomial of ordinary degree d in uy, ..., &n. For instance, 


21Z2 has weighted degree 3, and sjsz = (uj + ---)(uju2+---) has degree 3. If g(u) isa 
symmetric polynomial of degree d, and if G(z) is the polynomial such that g(u) = G(s), 
then G will have weighted degree d in z. 
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The discriminant of the cubic (16.2.4) is a homogeneous polynomial of degree 6 in u. 
There are seven monomials in Z1, Z2, 23 of weighted degree 6: 


6 94 2.2 33 33 
(16.2.9) Zp > 222, 2425, 2%. 223, 212223. ae 


and A is an integer combination of those monomials. We’ll determine the coefficients 
of the first four of these monomials using the systematic method: We set u3 = 0 in 
D = (uy —u2)* (uy — 43)? (U2 — 43), obtaining the symmetric polynomial (wu, — u2)urus = 
(s9? - 4s$)s3? in “41,42. Therefore D = sa — 435 + 53h, where h is a symmetric cubic 
polynomial. The coefficients of s° and S482 are zero. I don’t know an easy way to determine 
the remaining three coefficients of A, but one way is to assign some special values to the 
variables 1, 42, U3. 


16.3 SPLITTING FIELDS 


Let f bea polynomial with coefficients ina field F’, not necessarily an irreducible polynomial. 
A splitting field for f over F is an extension field K / F such that 


e f splits completely in K, say f(x) = (x —q@)---(x —q@,) witha; in K, and 
e K is generated by the roots: K = F(aj,..., @n). 


The second condition implies that, for every element 6 of K, there is a polynomial 
p(y, ...,Un) with coefficients in F, such that p(a,,...,@,) = B. In fact there will be 
many such polynomials: Since the roots are algebraic over F', some polynomials evaluate 
to zero. 

If our field F is a subfield of the complex numbers C, a splitting field K can be obtained 
simply by adjoining the complex roots of f to F, and we may refer to K as the splitting field 
of f. But if F is not a subfield of C, we have to construct a splitting field abstractly, as was 
explained in the last chapter (Section 15.6). 


Lemma 16.3.1 


(a) If FC LC K are fields, and if K isa splitting field of a polynomial f over F, then K is 
also a splitting field of the same polynomial over L. 

(b) Every polynomial f(x) in F[x] has a splitting field. 

(c) A splitting field is a finite extension of F, and every finite extension is contained in a 
splitting field. 


Proof. (a) This is obvious. 


(b) Given a polynomial f with coefficients in F,, there is a field extension K’ of F in which 
f splits completely (15.6.3). The subfield of K’ generated by the roots of f will be a splitting 
field. 


(c) A splitting field is generated by finitely many elements that are algebraic over F, so 
it is a finite extension of F. Conversely, a finite extension L/F is generated by finitely 
many elements, say 7), ..., ¥x, each of which is algebraic over F’. Let g; be the irreducible 
polynomial for y; over F, and let f be the product g; --- g,. We may extend the field L toa 
splitting field K of f over L, and then K will be a splitting field over F too. O 
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We now use symmetric functions to prove an amazing fact: 


Theorem 16.3.2 Splitting Theorem. Let K be an extension of a field F that is a splitting 
field of a polynomial f(x) with coefficients in F. If an irreducible polynomial g(x) with 
coefficients in F has one root in K, then it splits completely in K. 


This theorem provides a characteristic property of splitting fields. A splitting field K over F 
is a finite field extension with this property: 


An irreducible polynomial over F with one root in K splits completely in K. 


Which polynomial is used to define K as a splitting field is not important. 


Proof of the Splitting Theorem. Let f and g be as in the statement of the theorem. We are 
given a root £, of g in K, and we must show that g splits completely in K. Since g is 
irreducible, it is the irreducible polynomial for 6; over F. 

The splitting field K is generated over F by the roots a@1,..., @, of f. Every element 
of K can be written as a polynomial in a, with coefficients in F. We choose a polynomial 
Pi(uy,..., Un) such that py(a@) = By. 

Let {p1,..., Px} be the orbit of p;(u) for the operation of the symmetric group S, 
on the polynomial ring F[u;,..., Un], and let B; = pj(a@). So By, ..., By are elements of 
K. We will prove the splitting theorem by showing that the polynomial 


h(x) = (& — By) +++ — Bx) 


has coefficients in F’. Suppose that this has been proved. Then since fj is a root of h, it will 
follow that the irreducible polynomial for B, over F’,, which is g, divides h, and since h splits 
completely in K, g does too. 


Say that A(x) = x* — byxk-! + byxk-? ~ ... + by. The coefficients b1,..., bx 
are obtained by evaluating elementary symmetric functions at B = 6),..., By. But these are 
the elementary symmetric functions in k variables. We introduce new variables w1,..., Wx, 


and we label the elementary symmetric functions in these variables as s}(w),..., 5, (w), 
using a prime to remind us that the variables are the new ones. Then b; = s'(B). 

We evaluate sj in two steps: First, we substitute w = p, i.e., w; = pj(u). Because 
s'(w) is symmetric in w, s'(p) is asymmetric polynomial in u (16.1.14). Next, we substitute 
u; = a;. Because s'(p(u)) is symmetric in u, s'(p(a)) is in the field F (16.1.12). On the 
other hand, s’.( p(@)) = s'(B) = b;. The coefficients b; are in F. O 


16.4 ISOMORPHISMS OF FIELD EXTENSIONS 


For the rest of the chapter, we assume that our fields have characteristic zero, and we won't 
mention this assumption again. The field extensions that we consider will be finite extensions. 
We need a few definitions: 


e Let K and K’ be extension fields of F. The concept of an F-isomorphism a: K — K' was 
introduced before (see (15.2.9)). It is an isomorphism whose restriction to the subfield F is 
the identity map. An F'-automorphism of an extension field K is an F-isomorphism from K 
to itself. The F'-automorphisms of K are the symmetries of the field extension. 
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*« The F-automorphisms of a finite extension K form a group called the Galois group of K 
over F, which is often denoted by G(K/F). 


« A finite extension K/F is a Galois extension if the order of its Galois group G(K/ F) is 
equal to the degree of the extension: |G(K/F)| =[K: F]. 


We will see below (16.6.2) that the order of the Galois group always divides the degree 
of the extension. 


Example 16.4.1 The complex number field C is a Galois extension of the field R of real 
numbers. The Galois group G(C/R) is a cyclic group of order two, generated by the 
automorphism of complex conjugation. There is an analogous statement for any quadratic 
extension K/F. A quadratic extension is obtained by adjoining a square root, say that 
K = F(a), where a? = ais in F. The Galois group G of K/F has order two, and the 
element t of G different from the identity interchanges the two square roots @ and -a. 
For instance, if F = Q and K = Q(¥V2), there is an F-automorphism t of K that sends 
a+ b,/2 ~» a — b/2. We have seen this automorphism before. Oo 


Lemma 16.4.2. Let K and K’ be extensions of a field F. 


(a) Let f(x) be a polynomial with coefficients in F’, and let o be an F-isomorphism from 
K to K’. If visa root of f in K, then o(q) isa root of f in K’. 

(b) Suppose that K is generated over F by some elements a@,...,@,. Let o and o”’ be 
F-isomorphisms K > K’'. If o(a@j) = o’(a@;) fori = 1,...,n, then o = o”. If an 
F-automorphism oa of K fixes all of the generators, it is the identity map. 

(c) Let f be an irreducible polynomial with coefficients in F, and let w and a’ be roots of 
f in K and K’, respectively. There is a unique F-isomorphism 0: F(a) > F(a’) that 
sends a to a’. If F(a@) = F(a’), then o is an F-automorphism. 


Proof. (a) was proved in the last chapter (15.2.10). We omit the proof of (b). In (c), the 
existence of o was proved in the last chapter (15.2.8), and (b) shows that o is unique. O 


Proposition 16.4.3 


(a) Let f be a polynomial with coefficients in F. An extension field L./ F contains at most 
one splitting field of f over F. 

(b) Let f be a polynomial with coefficients in F. Any two splitting fields of f over F are 
isomorphic extension fields. 


Proof. (a) If L contains a splitting field of f, then f splits completely in L, say f = 
(x — a1) -+--(x — ay) with a; in L. If B is any root of f in L, substitution into this product 
shows that 8 = a; for some i. So f has no other roots in L, and the only splitting field of f 
that is contained in L is F(a ,..., @n). 


(b) Let K, and K>2 be two splitting fields of f over F. The first splitting field K is a finite 
extension of F’,, so it has a primitive element y. Let g be the irreducible polynomial for y 
over F. We choose an extension L of the second field Kin which g has a root y’, and we let 
K’ denote the subfield F()’) of L generated by y’. There is an F-isomorphism g: K, > K’ 
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that sends y to »’, and because K’ is F-isomorphic to the splitting field Kj, it is also a 
splitting field of f. Then both K’ and K2 are splitting fields contained in the field L, and (a) 
shows that they are equal. Therefore g is an F-isomorphism from K, to K2. O 


16.5 FIXED FIELDS 


Let H bea group of automorphisms of a field K. The fixed field of H, which is often denoted 
by K”, is the set of elements of K that are fixed by every group element: 


(16.5.1) K” ={ae K | o(a) =a foralloin H}. 


It is easy to verify that K” is a subfield of K, and that H is a subgroup of the Galois group 
G(K/K®#). The Fixed Field Theorem below shows that, in fact. H is equalto G(K/K"). 


Theorem 16.5.2 Let H be a finite group of automorphisms of a field K and let F denote 
the fixed field K’’. Let B; be an element of K, and let {f;, ... B,} be the H-orbit of Bj. 


(a) The irreducible polynomial for 6, over F is g(x) = (x — B))--- (x — B,). 
(b) ; is algebraic over F, and its degree over F is equal to the order of its orbit. Therefore 
the degree of £; over F divides the order of H. 


Proof. Part (b) of the theorem follows from (a). We prove (a). Say that 
g(x) = (x — Bi) ++: (x — By) =x" — yx” +--+ by. 


The coefficients of g are symmetric functions of the orbit {6;,..., 6,} (16.1.5). Since the 
elements of H permute the orbit, they fix the coefficients. Therefore g has coefficients in the 
fixed field. 


Let h be a polynomial with coefficients in F that has 6; as a root. Fori =1,...,7, 
there is an element o of AH such that o(f,) = f;. Because the elements of H are 
F-automorphisms of K and because h has coefficients in F’, 8; is also a root of h (16.4.2)(a), 
So x — f; divides f. Since this is true for every i, g divides f in K[x]andin F[x] (15.6.4)(b). 
This shows that g generates the principal ideal of polynomials in F[x] with root £,, and 
that g is the irreducible polynomial for 6; over F (15.2.3). 


An extension field K/F is called algebraic if every element of K is algebraic over F. 


Lemma 16.5.3. Let K be an algebraic extension of a field F that is not a finite extension of 
F, There exist elements in K whose degrees over F are arbitrarily large. 


Proof. We form a chain of intermediate fields F< Fy < F2 <--- as follows: We choose an 
element a of K that is not in F, and we let F; = F(a). Then @ is algebraic over F, so 
[F,: F] < oo, and therefore F; < K. Next, we choose an element @2 of K that is not in Fy, 
and we let Fz = F(a, a2). Then|[F2: F] < co and Fy < Fy < K. We choose a3 in K, not in 
F), etc. This chain of fields gives us a strictly increasing chain of finite extensions of F. The 
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degrees [ F;: F] become arbitrarily large, while remaining finite. Each extension F;/F has a 
primitive element y;, and the degrees of y; over F become arbitrarily large too. Oo 


Theorem 16.5.4 Fixed Field Theorem. Let H be a finite group of automorphisms of a field 
K,and let F = K®" be its fixed field. Then K is a finite extension of F, and its degree [K: F] 
is equal to the order | H| of the group. 


Proof. Let F = K" and let n be the order of H. Theorem 16.5.2 shows that the extension 
K/F is algebraic, and that the degree over F of any element 6 of K divides n. Therefore 
the degree [K : F] is finite (16.5.3). Let y be a primitive element for this extension. Every 
element o of H is the identity on F, so if o also fixes y, it will be the identity map — the 
identity element of H. Therefore the stabilizer of y is the trivial subgroup {1} of H, and the 
orbit of y has order n. Theorem 16.5.2 shows that y has degree n over F. Since K = F(y), 
the degree [ K: F] is equal to n too. a) 


Automorphisms of the field C(t) of rational functions in one variable provide examples 
that illustrate the Fixed Field Theorem and Theorem 16.5.2. 


Example 16.5.5 Let K = C(s), and let o and t be the automorphisms of K that are the 
identity on C and such that o(f) = it and r(t) = f!. Then o4 = 1, r* = 1, and to = o'r. 
Therefore o and t generate a group of automorphisms #H that is isomorphic to the dihedral 
group D4. 


Lemma 16.5.6 The rational function u = f* + £4 is transcendental over C. 


Proof. Let g(x) = x4+cq_,x¢-!+-.-++c 9 be a monic polynomial of degree d with complex 
coefficients. Then f4¢ g(u) is a monic polynomial of degree 8d in ¢. Since ¢ is transcendental, 
4 9(u) #0, and therefore g(u) £0. DO 


It follows from the lemma that the field C(u) is isomorphic to a field of rational 
functions in one variable. We show that it is the fixed field K”. We note that u is fixed by o 
and T. So it is in the fixed field K”, and therefore C(u) C K. Theorem 16.5.2 tells us that 
the irreducible polynomial for t over K® is the polynomial whose roots form its orbit. The 
orbit of f is 


{t fi Poti YA ere if} 
and the polynomial whose roots are the elements of this orbit is 
(x4 a ht 6 aoe r4)= x uxt AL. 


So ¢t is a root of a polynomial of degree 8 with coefficients in C(u), and therefore the 
degree [K :C(u)] is at most 8. The Fixed Field Theorem asserts that [K : K“] = 8. Since 
C(u) C K%, it follows that C(u) = K”. oO 
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This example illustrates a famous theorem: 


Theorem 16.5.7 Liiroth’s Theorem. Let F be a subfield of the field C(t) of rational func- 
tions that contains C and is not C itself. Then F is isomorphic to a field C(u) of rational 
functions. O 


16.6 GALOIS EXTENSIONS 


We come now to the main topic of the chapter: Galois theory. 


e If K is an extension field of F’, an intermediate field L isa field such that F CL C K. An 
intermediate field is proper if it is neither F nor K. 


If ZL is an intermediate field, then every L-automorphism of K will be an 
F-automorphism, and therefore 


(16.6.1) G(K/L) C G(K/F). 


Lemma 16.6.2 
(a) The Galois group G of a finite field extension K / F is a finite group whose order divides 
the degree [ K: F] of the extension. 


(b) Let H bea finite group of automorphisms of a field K. Then K is a Galois extension of 
its fixed field K”, and H is the Galois group of K/K". 


Proof. (a) By definition of F-automorphism, the elements of G act trivially on F, so F is 
contained in the fixed field K°. Then F C K° C K,so [K: K®] divides [K: F]. By the 
Fixed Field Theorem, |G| = [K:K®]. 

(b) By definition of K%, the elements of H are K-automorphisms. Therefore H is 
a subgroup of the Galois group G(K/K"). Since |G(K/K")| divides [K : K] and 
|H| =[K:K™"], the two groups are equal, and K is a Galois extension of K”. a) 


Lemma 16.6.3 Let , be a primitive element for a finite extension K of a field F and let 
f(x) be the irreducible polynomial for y; over F. Let yj,..., ¥ be the roots of f that are 
in K. There is a unique F'-automorphism o; of K such that oj(y,) = yj. These are all of the 
F-automorphisms of K,so G(K/F) has order r. 


Proof. There is a unique F-isomorphism oj: F(y) > F(j%;) that sends 7, ~~ y; (16.4.2)(e). 
We are. given that K = F'(}4), and since F'(}4) has the same degree over F, K = F(y;) too. 
Therefore o; is an F-automorphism of K. Every F-automorphism of K sends y; to a root 
of f, so it is one of the automorphisms oj. O 


Theorem 16.6.4 Characteristic Properties of Galois Extensions. Let K/F be a finite exten- 
sion and let G be its Galois group. The following are equivalent: 

(a) K/F isa Galois extension, ie.,|G| =[K: F], 

(b) The fixed field K© is equal to F, 

(c) K is asplitting field over F. 


Part (b) of the theorem can be used to show that an element of a Galois extension K 
is actually in the field F’, and (c) can be used to show that an extension is Galois. 
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Proof of the Theorem. (a) — (b): By the Fixed Field Theorem, |G| = [K: K°}. Since 
FC K° CK,|G| =[K:F]ifand only if F = K®. 


(a) = (c): Let n = [K: F]. We choose a primitive element y, for K over F. Let f be 
its irreducible polynomial over F. Since y, is a primitive element, the degree of f is n. 
Let j1,..., ¥ be the roots of f that are in K. Lemma 16.6.3 tells us that |G| = r. So 
iG| = [K: F], ie., the extension is Galois, if and only if f splits completely in K. Because 
K is generated over F by yj, it is also generated by the set of all the roots of f, so K isa 
splitting field over F if and only if f splits completely in K. O 


If K is the splitting field of a polynomial f over F, we may also refer to the Galois 
group G(K /F) of the extension K/ F also as the Galvis group of f. 


Corollary 16.6.5 


(a) Every finite extension K/F is contained in a Galois extension. 


(b) If K/ F is a Galois extension, and if L is an intermediate field, then K is also a Galois 
extension of L, and the Galois group G(K/L) is a subgroup of the Galois group 


G(K/F). 


Proof. Theorem 16.6.4 allows us to replace the phrase ‘‘Galois extension” by ‘“‘splitting 
field.” Then the Corollary follows from Lemmas 16.3.1 and 16.6.2. O 


Theorem 16.6.6 Let K/F be a Galois extension with Galois group G, and let g be 
a polynomial with coefficients in # that splits completely in K. Let its roots in K be 


Bi,.--, Br. 


(a) The group G operates on the set of roots {;}. 

(b) If K is a splitting field of g over /’, the operation on the roots is faithful, and by its 
operation on the roots, G embeds as a subgroup of the symmetric group S,. 

(c) If g is irreducible over F, the operation on the roots is transitive. 


(d) If K is a splitting field of g over F and g is irreducible over F, then G embeds as a 
transitive subgroup of S;. 


Proof. (a) is (16.4.2)(a) and (b) is (16.4.2)(b). If g is irreducible, it is the irreducible 
polynomial for 6; over F. Since F is the fixed field of G, Theorem 16.5.2 tells us that the 
roots 6; of g form the G-orbit of 6,. So the operation is transitive, as (c) asserts. Finally, (d) 
is the combination of (b) and (c). O 


This theorem is useful, though it doesn’t suffice to determine the Galois group. Both the 
integer r and the embedding into S, depend on f, not only on the Galois extension K. Also, 
the symmetric group 5S; has several transitive subgroups when r > 2. 


16.7 THE MAIN THEOREM 


One of the most important parts of Galois theory is the determination of the intermediate 
fields. The Main Theorem of Galois theory asserts that when K / F is a Galois extension, the 
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intermediate fields are in bijective correspondence with the subgroups of the Galois group. 
It will not be immediately clear why this fact is important; we will have to see it used to 
understand that. 


Theorem 16.7.1 Main Theorem. Let K be a Galois extension of a field F,, and let G be its 
Galois group. There is a bijective correspondence between subgroups of G and intermediate 
fields: 

{subgroups} <—> {intermediate fields). 


This correspondence associates to a subgroup Z its fixed field, andtoan intermediate field L 
the Galois group of K over L. The maps 


H~»~K® and L»G(K/L). 
are inverse functions. 


Proof. We must show that the composition of the two maps in either order is the identity 
map, and the work has been done. Let H be a subgroup of G and let L be its fixed field. 
The Fixed Field Theorem tells us that G(K/L) = H. In the other order, let L be an 
intermediate field and let H be the Galois group of K over L. Then K is a Galois extension 
of L (Corollary 16.6.5(b)). Theorem 16.6.4 tells us that the fixed field of H is L. O 


Corollary 16.7.2 (a) The correspondence given by the Main Theorem reverses inclusions: 
If L and L’ are intermediate fields and if H and H’ are the corresponding subgroups, then 
LCL’ if and only if H > H’. 


(b) The subgroup that corresponds to the field F is the whole group G(K/F), and the 
subgroup that corresponds to K is the trivial subgroup {1}. 


(c) If L corresponds to H, then[K:L]=|A| and[L: F] =[G: AH]. 


In (c), the first equality follows from the facts that K is a Galois extension of L and 
that H = G(K/L). Then the second equality follows, because 


|G|=[K:F]=[K:L][L: F] andalso |G| =|H|[G: H]. oO 


Corollary 16.7.3 A finite field extension K /F has finitely many intermediate fields 
FCLCK. 


Proof. This follows from the Main Theorem when K/F is a Galois extension, because 
a finite group has finitely many subgroups. Since we can embed any finite extension into 
a Galois extension, it is true for any finite extension. O 
Example 16.7.4 Let F be the field of rational numbers, and let w = V3 and 6 = V5, so that 
af = V15. The splitting field K = F(a, B) of the polynomial (x* — 3)(x* — 5) is a Galois 
extension of F of degree 4. Its Galois group G has order 4, so it is either the Klein four group 
or a cyclic group. It is easy to find three intermediate fields of degree 2 over F', namely F(a), 
F(B), and F(a@f). These three intermediate fields correspond to three proper subgroups of 
G. Therefore G is the Klein four group, which has three elements of order 2, hence three 
subgroups of order 2. The cyclic group of order 4 has only one subgroup of order 2. 
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The subgroups of order 2 are the only proper subgroups of G, so the Main Theorem 
tells us that there are no proper intermediate fields other than the three we have found. 
Consequently, an element y = a+ba+cB+da8 of K, with a, b, c,d in F, has degree 4 
over F unless it is in one of the three proper intermediate fields, and this happens only when 
at least two of the coefficients b, c, d are zero. O 


Suppose that we are given a chain of fields FC LC K, and that K is a Galois extension 
of F. Then K isalso a Galois extension of L. However, L needn’t be a Galois extension of F. 
To complete the picture, we show that the intermediate fields Z that are Galois extensions 
of F correspond to normal subgroups of G. 


Theorem 16.7.5 Let K/F bea Galois extension with Galois group G, and let L be the fixed 
field L of asubgroup H of G. The extension L/ F is a Galois extension if and only if His a 
normal subgroup of G. If so, then the Galois group G(L/F) is isomorphic to the quotient 
group G/H. 


(K) H=G(K/L) 
| operates on K 
G=G(K/F) i fixing L 
operates on K é L 
fixing F | | If H is normal, 
| then G/H = G(L/F) 
F ) operates here 


Proof. Let €; be a primitive element for the extension L./F. and let g be the irreducible 
polynomial for €; over F. This polynomial splits completely in the splitting field K; let its 
roots be €),.... €,. We have the following facts to work with: 


e L/F is a Galois extension if and only if it is a splitting field, which happens when all of 
the roots €; are in L. 


¢ [faroot e; isin L, then L = F(é;), because €; and €; have the same degree over F' and 
L = F(é}). 

e Anelement o of G is the identity on L if and only ifit fixes €;. So the stabilizer of €, is 
equal to H. 

e The operation of G on the set {€,;,..., €-} is transitive: For any i = 1,..., 7, there is an 
element o of G such that o(€,) = e; (16.4.2)(e). 


Let o be an element of G, and say that o(€;) = €;. Then F(e;) = L if and only if e; 
is in L, and if so, the stabilizer of €; will be equal to H. On the other hand, the stabilizer 
of o(€1) is the conjugate group sHo™!. Therefore K/F is a Galois extension if and only if 
oHo™! = H forallo, i.e., if and only if H is a normal subgroup. 


Suppose that Z is a Galois extension of F’. Then the roots e; are in L. An element o 


of the Galois group G will map € to another root €;, and therefore it will map L = F(e,) 
to F(€;) = L. So restricting o to L defines an F-automorphism of L. This restriction gives 
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us a homomorphism g:G — G(L/F). The kernel of g is the set of o that restrict to the 
identity on L, whichis H. Moreover, |G/H| = [G:H] = |G(L/F)|. The First Isomorphism 
Theorem tells us that G/H is isomorphic to G(L/F). O 


In the next sections, we examine some of the most important situations in which Galois 
theory can be used. 


16.8 CUBIC EQUATIONS 


Let f(x) = x? — ayx? + ax — a3 be an irreducible polynomial over F, and let K be a 
splitting field of { over F’. Say that the roots of f in K are a, a2, a3. Then in K[x], 
(16.8.1) f(x) = (x - a1) (x - a2) (x — a3). 


Since a is in F and a; = a, + &2 + a3, the third root a@3 1s in the field generated by the first 
two roots. So we have a chain of extension fields 


FC F(a,) C Flaj,a2) and F(a,, a2) = F(a;,a@2,a@3) = K. 


Let L denote the field /(a 1). Since f is irreducible over F’, [L: F] = 3. And since ay isin L, 
the polynomial f factors in L [x]: 


(16.8.2) f(x) = (& - a1) q(x), 


where q is the quadratic polynomial whose roots are @2 and @3. So K is obtained from L by 
adjoining a root of a quadratic polynomial. There are two cases: If g is irreducible over L, 
then [K:L} = 2 and [K: F] =6.Ifq is reducible over L, then a and a3 areinL,L = K, 
and [K: F] = 3. 


Examples 16.8.3 (a) f(x) = x? + 3x +1 is irreducible over Q, and its derivative is nowhere 
zero on the real line. Therefore f defines an increasing function of the real variable x that 
takes the value zero exactly once: f has one real root. This root does not generate the 
splitting field K, which also contains two complex roots. So [K:Q] = 6. 


(b) f(x) = x? — 3x + 1 isalso irreducible over Q. In this case, it happens that if a, is a root 
of f, then a2 = at — 2 is another root. This can be checked by substituting into f. So the 
splitting field K is equal to Q(a@1) and [K:Q] = 3. O 


We go back to an arbitrary irreducible cubic. By its operation on the roots, the Galois 
group G of K/F becomes a transitive subgroup of the symmetric group $3 (16.4.2)(c). The 
transitive subgroups are S3 and A3 —- acyclic group of order 3. If[K: F] = 3, then G = A3, 
and if [K: F] = 6, then G = $3. To distinguish these two cases, we need to decide whether 
or not the quadratic polynomial g(x) that appears in (16.8.2) is irreducible over the field 
L = F(a). Working in the field Z is painful. We would rather make a computation in the 
field F. Fortunately, there is an element that makes it possible to decide, the square root 6 
of the discriminant (16.2.5) of f: 


(16.8.4) 5 = (a — a2) (ay — 3) (rn — 3). 
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Its main properties are: 


e dis anelement of K, 
e 5#£0 (because the roots a; are distinct), and 
e a permutation of the roots multiplies 6 by the sign of the permutation. 


Theorem 16.8.5 Galois Theory for a Cubic. Let K be the splitting field of an irreducible 
cubic polynomial f over a field F,, let D be the discriminant of f, and let G be the Galois 
group of K/F. 


e If Disa square in F, then [K: F'] = 3 and G is the alternating group A3. 
e If Dis not a square in F, then [K: F] = 6 and G is the symmetric group $3. 


The discriminant of x? +3x+ 1 is -5- 3, nota square, while the discrminant ofx? —3x+1 
is 3*, a square (see 16.2.8)). This agrees with the discussion of the examples above. 


Proof of Theorem 16.8.5. A permutation of the roots multiplies 6 by the sign of the permu- 
tation. If 6 is in F, it is fixed by every element of G. In that case odd permutations can’t be 
in G, and therefore G = A3 and [K: F] = 3. If 6 isn’t in F then it isn’t fixed by G, so G 
contains an odd permutation. In that case, G = $3 and[K: F] = 6. Oo 


The alternating group has no proper subgroups. So if G = A3 there are no proper 
intermediate fields. This is obvious, because [K: F'] = 3 is a prime. The symmetric group $3 
has four proper subgroups. With the usual notation, they are the three groups < y>, <xy>, 
<x*y> of order 2, and the group <x> of order 3, which is A3. The Main Theorem tells us 
that when G = 53, there are four proper intermediate fields. They are F(a3), F(a2), F(a), 
and F(6). 


16.9 QUARTIC EQUATIONS 


Let f(x) be an irreducible quartic polynomial with coefficients in F, and let the roots of 
f ina splitting field K over F be a, a2, a3, a4. By its operation on the roots, the Galois 
group G = G(K/F) is represented as a transitive subgroup of S4 (16.6.6). The transitive 
subgroups are easy to determine because Sj is isomorphic to the octahedral group, a rotation 
group. Any subgroup will be a rotation group too, so it will be one of the groups listed in 
Theorem 6.12.1. The transitive subgroups of S4 are 


(16.9.1) SpA DiC Ds: 


There are three conjugate subgroups isomorphic to D4, and three conjugate subgroups 
isomorphic to C4. The subgroup D2, the Klein four group, consists of the identity and 
the three products of disjoint transpositions. It is a normal subgroup of S4 that we have 
seen before (2.5.15). (Some other subgroups of S4 are isomorphic to D2, but they aren’t 
transitive.) Notice that the order of G, which is equal to the degree [K: F'], distinguishes 
all of these groups except the last two. Unfortunately, it isn’t very easy to determine 
the degree. 

We begin with a type of quartic polynomial that can be analyzed concretely. I learned 


this from Susan Landau [Landau]. 
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Examples 16.9.2. Here F denotes the field Q of rational numbers. 


(a) Let a be the “‘nested”’ square roota@ = deals: To determine the irreducible polynomial 


for w over F, we guess that its roots might be ta and +a’, where a’ = V4 — J/5. Having 
made this guess, we expand the polynomial 


f(x) = (% — aw) (x + w(x — a’) (x +0’) = x4 — 8x? 4:11. 


It isn’t very hard to show that this polynomial is irreducible over F. We’ll leave the proof as 
an exercise. So it is the irreducible polynomial for @ over F. Let K be the splitting field of f. 
Then 

FC F(a)C F(a,a’) and Fla,a')=K 


Since f is irreducible, [ F(a): F] = 4 and since /5 is in F(a”), a’ = V4 — V5 has degree at 
most 2 over F(a). We don’t yet know whether or not q@’ is in the field F(q@). In any case, 
[K: F] is 4 or 8. The Galois group G of K/F also has order 4 or 8, soit is Dy, C4, or Do. 

Which of the conjugate subgroups D4 might operate depends on how we number the 
roots. Let’s number them this way: 


a=a, a2 =a’, a3=-a, a4 =-a’, 


With this ordering, an automorphism that sends a; ~»a; also sends a@3~» — a;. The 
permutations with this property form the dihedral group D4 generated by 


(16.9.3) o = (1234) and t = (24). 


Our Galois group is a subgroup of this group. It can be the whole group D4, the cyclic group 
C4 generated by o, or the dihedral group D2 generated by o” and T. 


Note: We must be careful: Every element of this group D4 permutes the roots, but we don't 
yet know which of these permutations come from automorphisms of K. A permutation that 
doesn’t come from an automorphism tells us nothing about K. O 


There is one permutation, p = o* = (13)(24), that is in all three of the groups 
D4, C4, and D2, so it extends to an F-automorphism of K that we denote by p too. This 
automorphism generates a subgroup N of G of order 2. 

To compute the fixed field K™, we look for expressions in the roots that are fixed 
by p. It isn’t hard to find some: a? = 4+ J/5 and aa’ = /11. So K% contains the field 
L = F(J/5, V11). We inspect the chain of fields F CL C K" C K. We have [K: F] < 8, 
[L: F] = 4, and [K: K%] = 2 (Fixed Field Theorem). It follows that L = KY, that 
[K: F] = 8, and that G is the dihedral group D4. 


(b) Let a = V2 + /2. The irreducible polynomial for a over F is x* —- 4x* + 2. Its roots are 


a, a’ = /2 — V2,-a, -a’ as before. Here aa’ = V2, which is in the field F(a). Therefore 
a’ is also in that field. The degree [ K: F] is 4, and G is either C4 or Do. 
Because the operation of G on the roots is transitive, there is an element o’ of G that 


sends a~ a’. Since a2 = 2+ J2 anda” =2- /2, o’ sends /2~»-V2 and aa’ ~»-aa’. 
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This implies that a’ ~»-a. So o’ = o. The Galois group is the cyclic group C4. 


(c) Let x = 4 + V7. Its irreducible polynomial over F is x*— 8x2 +9, Here aa’ = 3. Again, 
a’ is in the field F(a), and the degree [ K: F] is 4. If an automorphism o”’ sends @ ~» a’, then 
since aa’ = 3, it must send a’ ~ a. The Galois group is Dp. 


One can analyze any quartic polynomial of the form x4 + bx? + c in this way. O 
It is harder to analyze a general quartic 
(16.9.4) f(x) = x4 = ayx? + anx? — a3x t+ aa, 


because its roots a, ..., @4canrarely be written explicitly in a useful way. The main method 
is to look for expressions in the roots that are fixed by some, but not all, of the permutations 
in $4. The square root of the discriminant D is the first such expression: 


5 = | [ (aj — aj) = (or — 2) (ory — 13) (ty — 4) (22 — 083) (22 — 044) (23 — O24). 


i<j 


Because the roots are distinct, 6 isn’t zero, and as is true for cubic equations (16.8.4), a 
permutation o of the roots multiplies 6 by the sign of the permutation. Even permutations 
fix 6 and odd permutations do not fix 6. 


Proposition 16.9.5 Let G be the Galois group of an irreducible quartic polynomial f. The 
discriminant D of f is a square in F if and only if G contains no odd permutation. Therefore 
e If Disa square in F, then G is Aq or Dp. 
e If Dis not a square in F, then G is S4, Dg, or C4. 


Proof. D is a square in F if and only if 6 is in F, which happens when every element of 
G fixes 6. The permutations that fix 6 are the even permutations. The last statements are 
proved by looking at the list (16.9.1) of transitive subgroups of S4. O 


There is an analogous statement for splitting fields of a polynomial of any degree. 


Proposition 16.9.6 Let K be a splitting field over F of an irreducible polynomial f of 
degree n in F[x], and let D be the discriminant of f. The Galois group G(K/F) is a 
subgroup of the alternating group A, if and only if D is a square in F. O 


Lagrange found another useful expression in the roots a@;, one that is special to quartic 
polynomials. Let 


(16.9.7) By =aj02 +0304, By =0103+0204, B3 = a104+ 0203, 


and let 


g(x) = (x — Bi) — B2) (x — Bs). 
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This polynomial is called the resolvent cubic of f. Every permutation of the roots a; 
permutes the elements f ;, so the coefficients of g are symmetric functions in the roots. They 
are elements of F that can be computed when needed. 

By a lucky accident, the fact that the roots of an irreducible quartic are distinct implics 
that the elements §; are also distinct. For instance, 


By — Bo = a2 + 3004 — 1003 — C2044 = (OL) ~~ 4) (LQ — 3). 


Since the a, are distinct, 6; — Bz isn’t zero. The discriminants of the polynomials f and g 
are actually equal. 

Whether or not the resolvent cubic has a root in F gives us more information about 
the Galois group G. 


Proposition 16.9.8 Let G be the Galois group of an irreducible quartic polynomial f over 
F, and let g be the resolvent cubic of f. Then g is irreducible if and only if the order of G is 
divisible by 3. Moreover, 


e If g splits completely in F, then G = Dp. 
e If ghas one root in F, then G = Dy, or C4. 
e If gis irreducible over F, then G = Sq or Aq. 


Proof. The proof of the proposition is simple, but the fact that the three elements /; 
are distinct is an essential point that could easily be overlooked. Let B denote the set 
{61, 62, B3}. It has order 3. The operation of the symmetric group Sq on the roots a, 
defines a transitive operation on B, and the associated permutation representation is a 
homomorphism g: S4 -> S3 that we have seen before (2.5.13). Its kernel is the subgroup D. 
If g splits completely in F, the Galois group operates trivially on B, and therefore G = Dp. 


If g is irreducible over F, G operates transitively on B (16.6.6), so its order is divisible 
by three. Conversely, if |G| is divisible by three, then G contains an element of order 3, 
say o. Since the kernel of g is D2, p does not operate trivially on B. It permutes the three 
elements cyclically. Therefore G operates transitively on B, and g is irreducible. 


The rest of the proposition follows by looking back at the list (16.9.1). O 


Thus the polynomials x* — D, where D is the discriminant, and the resolvent cubic g(x) 
nearly suffice to describe the Galois group. The results are summed up in this table: 


Dasquare D nota square 


g reducible GH= Dy G = Dy,zor Ch 
girreducible | G—=A4 G = S4 


(16.9.9) 


Unfortunately, there is no simple expression in the roots that removes the remaining 
ambiguity (see Exercise M.11). 


Note: The proof of Proposition 16.9.8 makes use of the particular formulas (16.9.7) to define 
a permutation of the set B in terms of a permutation of the roots ay. If a permutation 
of the roots comes from an F-automorphism, the permutation of B will be given by that 
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automorphism. However, if the permutation doesn’t come from an F-automorphism, the 
permutation of B defined using the formulas has no meaning for the field. 

For example, let K be the splitting field of the polynomial x* — 2 over Q. We index 
the roots from 1 to 4 in the order a, = a, @2 = ia, @3 = -@, @4 = -ia, where @ is the 
positive real fourth root of 2. Then 6; = 2i/2, Bo = 0, 63 = -2i/2. The transposition 
€ = (12) isn’t an element of the Galois group. When we use the formulas 16.9.7 to define 
how € permutes the set B, the operation we obtain switches 6) and 63. Since 62 = 0 and 
£30, this permutation makes no sense algebraically. 


16.10 ROOTS OF UNITY 


In this section, F denotes the field Q of rational numbers. The subfield of the complex 
numbers generated over F by an nth root of unity ¢, = e?7'/” is called a cyclotomic field. 
We’ll assume that n is a prime integer p. The irreducible polynomial for ¢ = e?!/P over the 
rational numbers is 


(16.10.1) f(x) = xP hye. 4 x41 


(Theorem 12.4.9). Its roots are the powers ¢, ¢”,..., €?~!, so ¢ generates the splitting field 
of f,and therefore K = F(¢) is a Galois extension of F of degree p — 1. 


Proposition 16.10.2 

(a) Let p bea prime, and let ¢ = e””'/P, The Galois group of Q(¢) over Q is a cyclic group 
of order p — 1. It is isomorphic to the multiplicative group EK of nonzero elements of 
the prime field Fp. 

(b} For any subfield F’ of C, the Galois group of F’(¢) over F’ is a cyclic group. 


Proof. (a) With F = Q, let G be the Galois group of F(¢) over F. An element o of G is 
determined by the image o(¢), which can be any one of the p — 1 roots of f. Let’s call o; the 
element such that o;(¢) = ¢'. The exponent i is determined as a nonzero residue modulo p 
because €? = 1. So sending oj ~» i defines a bijective map €:G > ER The computation 


010 j(6) = o1(6/) = o(0)/ = C1 


shows that € is a homomorphism, and therefore an isomorphism. The fact that es is cyclic is 
a part of Theorem 15.7.3. 

The element o, that sends €~»¢” generates G if and only if v is a primitive root 
modulo p, a generator for the cyclic group a 


(b) An element o of the Galois group G’ = G( F’()/ F’) will also send ¢ to a power ¢”. The 
proof above shows that G’ is isomorphic to a subgroup of the cyclic group ee Therefore it 
is acyclic group too. Oo 


Example 16.10.3. p = 17 and ¢ = e’®, where 6 = 277/17. 
The residue of 3 is a primitive root modulo 17, so the Galois group G = G(K/F)isa 
cyclic group of order 16, generated by the automorphism o that sends ¢ ~» ¢°. There are five 


498 Chapter 16 Galois Theory 


subgroups, of orders 16, 8, 4, 2, and 1, generated byo, o”, o*, 08, and 1, respectively. Let the 
fixed fields of the subgroups be F = Ly = K<°, L, = KS? ?, Ly = KX™, L3 = KX™?, 
and L4 = K. They form a chain of fields Lp CL; C Ly C L3 C La, where the degree of each 
extension L; /L;_1 is2. The Main Theorem tells us that these are the only intermediate fields. 


Lemma 16.10.4 The field L3 defined above is generated by cos 0, and it has degree 8 over F. 


Proof, Let L’ = F(cos@). Since €+¢! = 2cos0, cos@ isin K = F(£). Moreover, Cis a root 
of the quadratic polynomial (x — 2)(x — c!) = x? — 2(cos@)x +1, which has coefficients in 
L’,so[K:L’] <2 and[L’: F] > 8. Therefore L’ is either L3 or K, and since L’ is a subfield 
of R but K is not, L’ = L3. O 


Corollary 16.10.5 The regular 17-gon can be constructed with ruler and compass. 


Proof. The chain F C L; C L2 C L3 shows that we can reach the field L3, which contains 
cos 9, by a sequence of three successive square root adjunctions, and since L3 is a subfield of 
R, these square roots are real. (See (15.5.10).) 0 


The next lemma is useful for describing the quadratic extension L, of F: 
Lemma 16.10.6 Let a = c)0 +20? +--+ + Cp-okP = ACh 1 cP?-! be a linear combination 


with rational coefficients c;, where € = e?”'/P and p is prime. If q is a rational number, then 
cy = C2 =... = Cp, and @ = -c}. 


Proof. Since ¢ is a root of f (16.10.1), we can solve for ¢?—! and rewrite the given linear 
combination as @ = (-Cp_4)1+ (cy —Cp-1)$ +- +++ (Cpa Cp ee Because the powers 


TD eeGigheces nis oot form a basis for K over F, this combination is a rational number only if all 
coefficients except -Cp_; are equal to zero. If so, then cj = Cp_, for every i and @ = ~c}, as 
asserted. Oj 


Example 16.10.7 The case p = 17, continued. 
The powers of the primitive root 3 modulo 17, listed in order, and with representatives 
for the congruence classes taken between -8 and 8, are 


(16.10.8) 1, 3, -8, -7, -4, 5, -2, -6, -1, -3, 8, 7, 4, -5, 2, 6. 


The automorphism o of K = F(¢) that sends ¢ to ¢° generates the Galois group G, and it 
runs through the powers of ¢ in the corresponding order: 


(16.10.9) Ov eeee oe mee ane 


The G-orbit of ¢ consists of the 16 powers of ¢ different from 1. 
Let H denote the subgroup <o*> of order 8. The G-orbit of ¢ splits into two H-orbits 
that are obtained by taking every other term in the sequence of powers (16.10.9): 


fe; ee, ae .--} and age aie ain Bele 


Let a and @2 denote the sums over these two orbits, respectively: a, = ¢+¢%+.---. 
The set (a1, @} is a G-orbit. Theorem 16.5.2 tells us that the elements a; have degree 2 
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over the fixed field of G, which is F,, and that the irreducible polynomial for @; over F is 
(x — a@)(x — a2). To determine this polynomial, we need to compute the two symmetric 
functions s;(@) = a, + @2 and s2(@) = a1 Q@. 

To begin with, we note that s;(q@) is the sum of all powers of ¢ different from 1, so 
S1(a) = -1 (16.10.6). Next, 


S2(a) = ana = (C4 08 + (OE TH). 
Writing a; requires writing ¢ many times, so we use a shorthand. We write 
(16.10.10) a, = [1,-8,-4,-2,-1, 8,4,2], a2 =[3,-7,5, -6,-3, 7, -5, 6}. 


This notation indicates that a is the sum of the powers of € whose exponents are in the first 
bracketed string. To compute s2(@), we must add each of the eight terms in the first string to 
those in the second string, modulo p, obtaining 64 exponents. Then s2(q@) will be the sum of 
the corresponding powers of ¢. Let’s not do this explicitly. Since s2(q@) is a rational number, 
all powers different from £° = 1 must occur the same number of times (16.10.6). We notice 
that we won’t get any zeros when we do the addition, because a residue and its negative are 
in the same bracketed sequence. So the 64 terms must include four of each of the 16 nonzero 
exponents. Therefore s3(a) = -4. The irreducible polynomial for a; over F is 


(16.10.11) (x -—a)(x —a2) = x? +x—-4. 
Its discriminant is 17,s0 L; = F(V17). oO 


One can determine the extension field of degree 2 over F that is contained in the 
cyclotomic field F(¢,) for any odd prime p in the same way. 


Theorem 16.10.12 Let p be a prime different from 2, and let L be the unique quadratic 
extension of Q contained in the cyclotomic field Q(¢,). If p=1 modulo 4, then L = Q(,/p), 
and if p=3 modulo 4, then L = Q(,-p). 


This seems to be an occasion for “‘proof by example.”’ The case that p=1 modulo 4 
is illustrated by the prime 17, and the computation is analogous for any such prime. We’ll 
illustrate the case p = 3 modulo 4 by the prime 11. The residue of 2 is a primitive root 
modulo 11. Its powers list the nonzero residue classes modulo 11 in the order 


1,2; 45-35-9571 2252453359. 


Let ¢ = 11 and let o be eae automorphisiy that sends ¢~» ¢?. With shorthand notation as 
above, the orbit sums of o? are 


a, =[1,4,5,-2,3], a2 = [2,-k,-1, -4, -5]. 


Here if k is in the list of exponents for the sum a, then -k is in the list for w2. Therefore zero 
occurs five times among the 25 terms in the list of exponents for a@,@2, and this contributes 
5 to aa. Since w@,q@ is in Q, the 20 remaining terms must consist of two of each of the 10 
nonzero congruence classes modulo 11. The sum of these terms contributes -2. Therefore 
0402 = 3. The irreducible polynomial for a; is x? + x + 3. Its discriminant is -11. Oo 
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Theorem (16.10.12) is a special case of a beautiful theorem of algebraic number theory. 


Theorem 16.10.13 Kronecker-Weber Theorem. Every Galois extension of the field Q of 
rational numbers whose Galois group is abelian is contained in one of the cyclotomic 
fields Q(¢,,). DO 


16.11 KUMMER EXTENSIONS 


This section is devoted to the following theorem: 


Theorem 16.111 Let F be a subfield of C that contains the pth root of unity £ = e?7'/P, 
Pp prime, and let K/ F be a Galois extension of degree p. Then K is obtained by adjoining a 
pth root. In other words, K is generated over F by an element £, with 8? in F. 


Extensions of this type are often called Kummer extensions. The Galois group of a Kummer 
extension is a cyclic group of prime order. 

The theorem is familiar for p = 2: Every extension of degree 2 can be obtained by 
ea a square root. But suppose that p = 3 and that F contains the cube root of unity 

e2"'/3_ Tf the discriminant of the irreducible cubic polynomial f (16.2.7) is a square in 
F. ‘then the splitting field of f has degree 3 (16.8.5). The theorem asserts that the splitting 
field has the form F(4/b), for some b in F. This isn’t obvious. If the discriminant is not a 
square, the roots cannot be obtained by adjoining a cube root. (This is Exercise 11.1.) 

The next proposition completes the picture. Suppose that B is the pth root of a 
nonzero element b of F in an extension field K. Then it will be a root of the polynomial 
g(x) =x? —b, andif Cis in F, the roots of fin K willbe ¢’-Bforv=0,1,..., p—1.S0B 
will generate the splitting field of g over F. 


Proposition 16.11.2 Let p be a prime, let F be a field that contains the pth root of unity 
¢ = e*™/P_ and let b be a nonzero element of F. The polynomial g(x) = x? — b is either 
irreducible over F, or else it splits completely. 


Proof. Let K be a splitting field of g over F, and suppose that some root £ of g is not in 
F. Then the degree [K: F] will be greater than 1, so the Galois group G = G(K/ F) will 
contain an element o different from the identity. Since 6 generates K over F’, o(f) cannot 
be equal to 8. So o(B) = ¢’B for some v with 0 < v < p. We also have o(¢) = ¢. Therefore 

o*(B) = ¢°(¢"B) = ¢?"B, and in general, o*(f) = C*”B. Since 0 < v < pand pis prime, the 
multiples of v run through all residues modulo p. This shows that G operates transitively on 
the p roots of g. Therefore g is irreducible over F. O 


Proof of Theorem (16.11.1). The proof is nice. We view K as a vector space over F’,, and we 
verify that an element o of the Galois group G is a linear operator on K: If a and £ are in 
K and cis in F, then o(c) = c. Since o is an automorphism, 


o(a@+ B) =o(a)+o(B) and o(ca) =o(c)o(a) =co(a), 


We choose a generator o for the cyclic Galois group G. Then o” = 1, so any eigenvalue 
A of o must satisfy the relation X? = 1, which means that A is a power of ¢. These eigenvalues 
are in the field F by hypothesis. Moreover, a linear operator of order p has at least one 
eigenvalue different from 1. This is because, over the complex numbers, the matrix of o is 
diagonalizable (see Theorem 4.7.14 or Corollary (10.3.9)). Its eigenvalues are the entries of 
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the corresponding diagonal matrix A. If o is not the aul; then A 4/, so some diagonal 
entry must be different from 1. 


Let B be an eigenvector of o with eigenvalue 4 #1, and let b = B?. Then o(B) = AB, 
hence o(b) = (AB)? = b. Since o generates G, b is in the fixed field, which is F, while 8 is 
not in F. Since [K: F] is prime, F(B) = K. O 


With notation as in Theorem 16.11.1, say that K is the splitting field over F of an 
irreducible polynomial f of degree p. There is a simple expression in the roots of f that 
often yields an eigenvector for the operator o. The permutation of the roots @),...,Q@p 
of f that is defined by o will be cyclic, so if we number the roots appropriately, o will be the 
permutation (12 --- p). Let A be an eigenvalue of o, and let 


(16.11.3) B=a + Aaa +++ +AP Tay, 


Then o( 8) = @2 +AQ3---+ Pas + A?-la, = A718. So unless 8 happens to be zero, 
it will be an eigenvector with eigenvalue A7!. 


Example 16.11.4 Kummer’s theorem leads to a formula for the roots of a cubic polynomial 
that was discovered in the sixteenth century by Cardano and Tartaglia. The derivation 
that we outline here isn’t as short as Cardano’s, but it is easier to remember because it 
is systematic. We suppose that the quadratic coefficient of the cubic is zero, and to avoid 
denominators in the solution, we write it as 


f(x) =x 4+ 3pxt 2q. 
Then s; = 0, s2 = 3p, 3 = —2q, and the discriminant is D = —273°(q* + p*). 
Let the roots be “1, “2, 43, numbered arbitrarily. With w = e?”/3, the elements 
Z= uy +@u2 +@%u3 and 7 =u, +@7u2 + U3 


are eigenvectors for the cyclic permutation o = (123). Since 1+ w+ @* = 0, 


zt+z7=58,4+24+7 =u, 
The cubes z? and z’° are fixed by o, so according to Kummer’s Theorem and Theorem 
16.8.5, they can be written in terms of p, g, 5 = VD, and w. When the cubes are written in 
this way, wu; = z + z’ will be expressed as a sum of cube roots. 
One makes the following computations. Let 
A= usu2 + u5u3 + usu, 
B= usu + uzur + ui U3. 
Then 
A — B= (u, — uz) (uy — 43) (uz — 43) = 4, 
A+ B=3}j5S2 — 353 = 6q. 
Also, u; + u3 + u3 = s3 + 35152 + 353 = -6q. 
One solves for A, B and expands 2 and z’°. The result of this computation is Cardano’s 
formula: 


oe: | 
(16.1.5) ua das fet et |-a- fee 


For instance, if f(x) = x? + 3x +2, thenx = en gee ee 
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However, the formula is ambiguous. In the term V -q+/q* + p°, the square root 
can take two values, and when a square root is chosen, there are three possible values for 
the cube root, giving six ways to read that term. There are also six ways to read the other 
term. But f has only three roots. Oo 


16.12 QUINTIC EQUATIONS 


The main motivation behind Galois’s work was the problem of solving fifth-degree equations. 
A short time earlier, Abel had shown that the quintic equation 


(16.12.1) x —ayx! + anx — a3x? + asx — as =0 


with variable coefficients a; couldn’t be solved by radicals, but no equation with integer 
coefficients that couldn’t be solved was known. Anyhow, the problem was over 200 years 
old, and it continued to interest people. In the meantime Galois’s ideas have turned out to 
be much more important than the problem that motivated them. It is amazing that Galois 
was able to do what he did before the concept of a group was developed. 


Proposition 16.12.2 Let F be a subfield of the complex numbers. The following two 
conditions on a complex number @ are equivalent, and aq is called solvable over F if it 
satisfies either one of them: 


(a) There is achain of subfields F = Fp C Fi C... CF, = K of C such that a is in K, and 
e j=1,...,9r, Fj = Fj-1(B;), where a power of 8; isin Fj_1. 
(b) There is a chain of subfields F = Fo C Fy C...CFs;=K of C such that @ isin K, and 


¢ for j=1,...,7r, Fj41 is a Galois extension of Fj of prime degree. 


The proof of the proposition isn’t difficult, but it doesn’t have much intrinsic interest, so 
we defer it to the end of the section. We need condition (b) in order to be able to use 
Galois theory. It is the more important characterization of solvability, and one can avoid the 
technicality of the proposition by accepting it as the definition. 


Condition (a) means that F7 is generated over F’j_; by an nth root for some integer n 
(that depends on /). It is similar to the description of the real numbers that can be constructed 
by ruler and compass. In that description, only square roots of positive real numbers are 
allowed. Theoretically, one could unravel the extensions to write a solvable element & using 
a succession of nested roots. But as with Cardano’s solution of the cubic equation, there is 
a great deal of ambiguity in a formula involving radicals, because there are n choices for 
an nth root. It is useless to write a root explicitly as a complicated expression in radicals. 
Indeed, Cardano’s formula is useless. 


Proposition 16.12.3 If @ is a root of a polynomial of degree at most four with coefficients in 
afield F’,, then @ is solvable over F. 


Proof. For quadratic polynomials, the quadratic formula proves this. For cubics, Cardano’s 
formula 16.11.7 gives the solution. If f(x) is quartic, we begin by adjoining the square root 
5 of D. Then we use Cardano’s formula to solve for a root of the resolvent cubic g(x), and 
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we adjoin it. At this point, Table 16.9.9 shows that the Galois group of f over the field 
that we obtain is a subgroup of the Klein four group. Therefore f can be solved by a 
sequence of at most two more square root extensions. 0 


Theorem 16.12.4 Let f be an irreducible polynomial of degree 5 over a subfield F of the 
complex numbers, whose Galois group G is either the alternating group As or the symmetric 
group Ss. Then the roots of f are not solvable over F. 


Proof. If G = Ss, we replace F by the quadratic extension F(4), where 6 is the square root 
of the discriminant. If we can solve over F, we can solve over the larger field F(5). So we 
may assume that G is the alternating group As, a simple group (7.5.4). 


Our strategy is as follows: We consider a Galois extension of F’/ F of prime degree p, 
with Galois group G’, a cyclic group of order p, and we show that no progress toward 
solving the equation f = 0 is made when one replaces F by F’. We do this by showing that 
the Galois group of f over F’ is again the alternating group As. Because As contains an 
element of order 5, it cannot be the Galois group of a reducible polynomial of degree 5. So f 
remains irreducible over F’. Therefore there is no chain of type (16.12.2)(b), and the rocts 
of f are not solvable. 


We choose such an extension F”’, and then we have two Galois extensions. The first, 
K/F, is the splitting field of the quintic polynomial f over F. Its Galois group is G = As. 
The second, F’/ F, has a cyclic Galois group G’ of order p, and since itis a Galois extension, 
it is the splitting field of some irreducible polynomial g over F. 


Let K’ be the splitting field over F of the product polynomial fg. It is generated by the 
complex roots @1,...,@5 and B;,..., Bp of f and g, respectively. The roots a; generate 
the splitting field K of f, and the roots 6; generate the splitting field F’ of g. The inclusions 
among the four fields are shown in the diagram below. Each of the extension fields is a 
Galois extension, and the Galois groups have been labeled in the diagram. 


K’' 
Hie 
x, 
K G F’ 
tee ee 
a 3 


Since K is a Galois extension of F, G is isomorphic to the quotient group G/ HA’, and since 
F' is a Galois extension of F’, G’ is isomorphic to the quotient group G/H (16.7.5). Our plan 
is to show that H is isomorphic to G, ie., that H is the alternating group As. 


The group 7’ consists of the F-automorphisms of K’ that fix the roots w;, and 7 
consists of the F-automorphisms that fix the roots 6;. If an F-automorphism of K’ fixes the 
roots @; and also the roots £ ;, then since these roots generate K’, it is the identity. Therefore 
H 0 At’ is the trivial group. 

We restrict the canonical map G — G/H * G’ to the subgroup H’. The kernel of this 
restriction is the trivial group HM HA’, so the restriction is injective. Itmaps H’ isomorphically 
to a subgroup of G’. By hypothesis, G’ is cyclic of prime order p. So there are only two 
possibilities: either H’ is the trivial group, or else H’ is cyclic of order p. 
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Case 1: H’ is the trivial group. Then the surjective map from G to the quotient group 
G/H' = G is an isomorphism, and G is isomorphic to the simple group G = As. This makes 
the existence of a surjective map from G to the cyclic quotient group G/H ~ G' impossible. 
So this case is ruled out. 


Case 2: H' is cyclic of order p. Then |G| = |G||H’| = p|G| and also |G| = |G’||H| = p| HA. 
Therefore G and H have the same order, 60. We restrict the canonical map G > G/H'=G 
to the subgroup H. The kernel of this restriction is the trivial group HN H’, so the restriction 
is injective. It maps H isomorphically to a subgroup of G. Since both groups have order 60, 
the restriction is an isomorphism, and HG = As. Oo 


We now exhibit an irreducible polynomial of degree 5 over Q, whose Galois group 
is Ss. The facts that 5 is a prime integer and that the Galois group G acts transitively on 
the roots a1, ..., 5 limit the possible Galois groups. Since the action is transitive, |G| is 
divisible by 5. Thus G contains an element of order 5. The only elements of order 5 in Ss are 
the 5-cycles. We leave the next lemma as an exercise. 


Lemma 16.12.5 If a subgroup G of Ss contains a 5-cycle and also a transposition, then 
G = Ss. 


Corollary 16.12.6 Let f(x) be an irreducible polynomial of degree 5 over Q. If f has exactly 
three real roots, its Galois group G is the symmetric group, and hence its roots are not 
solvable. 


Proof. Letthe roots be a, ..., @s, with a1, @2, a3 real and a4, a5 complex, andlet K be 
the splitting field of f. The only permutations of the roots that fix the first three roots are 
the identity and the transposition (45). Since F(@;, a2, @3)# K, that transposition must be 
in G. Since G operates transitively on the roots, it contains an element of order 5, a 5-cycle. 
SoG= Ss. O 


Example 16.12.7. The polynomial x* — 16x = x(x? — 4)(x? + 4) has three real roots. Of 
course it is reducible, but we we can add a small constant without changing the number of 
real roots. This is seen by looking at the graph of the polynomial. For instance, x° — 16x + 2 
also has three real roots, and it is irreducible over Q. Its roots are not solvable over Q. O 


We now prove Proposition 16.12.2. 


Lemma 16.12.8 Let K/F be a Galois extension whose Galois group G is abelian. There is 
a chain of intermediate fields F = Fy C F, C---C Fy = K such that F;/Fj_ is a Galois 
extension of prime degree for each i. 


Proof. The abelian group G contains a subgroup H of prime order. This subgroup corre- 
sponds to an intermediate field L, and K is a Galois extension of L with group H. Because G 
is abelian, H isanormal subgroup, and therefore L is a Galois extension of F’ with abelian Ga- 


lois group G = G/H. Since G has smaller order than G, induction completes the proof. O 


Proof of Proposition 16.12.2. (a) = (b) We begin with the chain of fields (a), and we add 
more extensions and more fields to the chain to arrive at a chain having the properties 
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(b). First, since “/a = JS Ja, we can, at the cost of adding intermediate fields, suppose that 
all the roots that occur in our chain are pth roots for various primes p. We make a note of 
the primes p1,..., px that occur, and set this chain aside for the moment. 

We go back to the field F, and to start, we adjoin the pyth roots of unity for 
v=1,...,k, one after the other. Each of these extensions is Galois, with a cyclic Galois 
group (Proposition 16.10.2(b)). Lemma 16.12.8 shows that each of them contains a chain 
whose layers are Galois extensions of prime degree. 

Let F’ be the field we obtain. We continue by adjoining the roots that we were 
given, but to F’. By Kummer theory, each of these root adjunctions will now be a Galois 
extension with a cyclic Galois group of prime order, unless it becomes a trivial extension. 
The field K’ that we obtain at the end of our new chain will contain the last field K of 
‘the chain given to start, so a@ will be an element of K’. Therefore this new chain is one of 
the form (b). 


(b)=(a) Suppose that we are given a chain (b), and consider one of the extensions in the 
chain, say F;_, C Fj. It is a Galois extension of prime degree, say degree p. Theorem 16.11.1 
shows that this extension is obtained by adjoining a pth root, provided that the pth roots of 
unity are in F;_;. So we enlarge the chain, beginning by adjoining the required pth roots of 
unity to F. The enlarged chain will satisfy condition (a). ca 


{| parait aprés cela qu’il n’y a aucun fruit a tirer 
de la solution que nous proposons. 


—Evariste Galois 


EXERCISES 


Section 1 Symmetric Functions 


1.1. Determine the orbit of the polynomial below. If the polynomial is symmetric, write it in 
terms of the elementary symmetric functions. 


(a) uiuz +uzu3+uzu, (n= 3), 
(b) (uy + u2)(u2 +43)(U1 +43) (n = 3), 
(c) (uw, — u2)(u2 — u3)(u, — 43) (n =3), 
(d) w3u2 + u3U3 + u3u) — uy) u3 — ujui—u3zu? (n =3), 
(ec) W+ust---+u3. 
1.2. Find two bases for the ring of symmetric polynomials, as a module over the ring R. 
*1.3. Let wp =uk +... +uk. 


(a) Prove Newton’s identities: wy — S; WR, +--+ ESp_-1 UW Fks, =0. 
(b) Do wj,..., Wn generate the ring of symmetric functions? 
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Section 2 The Discriminant 
2.1. Prove that the discriminant is a symmetric function. 
2.2. (a) Prove that the discriminant of a real cubic is non-negative if and only if the cubic has 
three real roots. 
(b) Suppose that a real quartic polynomial has a positive discriminant. What can you say 
about the number of real roots? 
2.3. (a) Prove that the Tschirnhausen substitution (16.2.6) does not change the discriminant 
of a cubic polynomial. 
(b) Determine the coefficients p and g in (16.2.7) that are obtained from the general 
cubic (16.2.4) by the Tschirnhausen substitution. 
2.4. Use undetermined coefficents to determine the discriminant of the polynomial 
(a) x? + pxtq, (b) x44 pxt+g, () + pxtg. 
2.5. Use the systematic method on the discriminant in four variables, to determine the 
coefficients in A(s;, ..., 84) of all monomials not divisible by sq. 


2.6. Let uv, =u; + t,i = 1, 2,3. Compute the derivatives 4 5;(u') and fa’), and use your 
results to verify Formula 16.2.5 for the discriminant of a cubic. 


2.7. There are n variables. Let m = uyu5u3---u"~} and let p(u) = Yo o(m). The 
ocAn 


Sn-orbit of p(u) contains two elements, p and another polynomial g. Prove that 
(p- 4)° = Du). 


Section3 Splitting Fields 


3.1. Let f be a polynomial of degree n with coefficients in F and let K be a splitting field for 
f over F. Prove that [K: F] divides n!. 
3.2. Determine the degrees of the splitting fields of the following polynomials over Q: 
(a) x? 2, (b)x4-1, (ec) x441. 
3.3. Let # = F2(u) be the field of rational functions over the prime field F2. Prove that 


the polynomial x? — u is irreducible over F, and that it has a double root in a splitting 
field. 


Section 4 Isomorphisms of Field Extensions 


4.1. (a) pel all automorphisms of the field Q( /2), and of the field Q(./2, w), where 
2701/3 
wr=e : 
(b) Let K be the splitting field over Q of f(x) = (x? — 2x — 1)(x? -- 2x — 7). Determine 
all automorphisms of K. 


Section5 Fixed Fields 


5.1. For each of the following sets of automorphisms of the field of rational functions C(a), 
determine the group of automorphisms that they generate, and determine the fixed field 
explicitly. 


(a) o(t) = gi , (b) o(t) =it, (Qo(H =-t, tH= th, 
(d) o(f) = wt, T(t) = 1, where w = e?!/3, 
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rae ae 
5.2. Show that the automorphisms o(t) = — and r(t) = ~ . of C(t) generate a group 
-1 


isomorphic to the alternating group A4, and determine the fixed field of this group. 
5.3. Let F = C(d) be the field of rational functons in ¢. Prove that every element of F that is 
not in C is transcendental over C. 
Section6 Galois Extensions 
6.1. Let a@ be a complex root of the polynomial x? + x + 1 over Q, and let K be a splitting 
field of this polynomial over Q. Is V-31 in the field Q(q@)? Is it in K? 


6.2. Let K = Q(V2, V3, V5). Determine [K : Q], prove that K is a Galois extension of Q, 
and determine its Galois group. 


6.3. Let K > L > F bea chain of extension fields of degree 2. Show that K can be generated 
over F by the root of an irreducible quartic polynomial of the form x4 + bx? + c. 
Section 7 The Main Theorem 


7.1. Determine the intermediate fields of an extension field of the form F(./a, Vb) without 
appealing to the Main Theorem. 


7.2. Let K/F be a Galois extension such that G(K/F) ~ C2 X Cj2. How many intermediate 
fields Z are there with (a) [L: F] =4, (b)[L: F] =9, (c) G(K/L) =C4? 


7.3. How many intermediate fields L with [L: F] = 2 are there when K/F is a Galois 
extension with Galois group (a) the alternating group A,, (b) the dihedral group D4? 
7.4. Let F = Qand K = Q(V2, V3, V5). Determine all intermediate fields. 


7.5. Let f(x) be an irreducible cubic polynomial over Q whose Galois group is $3. Determine 
the possible Galois groups of the polynomial (x? — 1) f(x). 


7.6. Let K/F be a Galois extension whose Galois group is the symmetric group $3. Is K the 
splitting field of an irreducible cubic polynomial over F? 


7.7, (a) Determine the irreducible polynomial for i + V2 over Q. 
(b) Prove that the set (1, i, V2, iV2) is a basis for Q(i, V2) over Q. 

7.8. Let a denote the positive real fourth root of 2. Factor the polynomial x* — 2 into 
irreducible factors over each of the fields Q, Q(/2), Q(V2, i), Q(a), Q(a, i). 


7.9. Let ¢ = e?/5, Prove that K = Q(¢) is a splitting field for the polynomial x° — 1 
over Q, and determine the degree [ K:Q]. Without using Theorem 16.7.1, prove that K is 
a Galois extension of Q, and determine its Galois group. 


7.10. Let K/F be a Galois extension with Galois group G, and let H be a subgroup of G. 
Prove that there exists an element 6 € K whose stabilizer is equal to H. 


7.11. Leta = 2, B = V3, and y= a+ B. Let L be the field Q(a, f), and let K be the 
splitting field of the polynomial (x3 — 2)(x? — 3) over Q. 


(a) Determine the irreducible polynomial f for y over Q, and its roots in C. 
(b) Determine the Galois group of k/Q. 
Section 8 Cubic Equations 


8.1. Let K/F be a Galois extension whose group G is the Klein four group D2. Prove that K 
can be obtained by adjoining two square roots to F, and explain how G acts on K. 
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8.2. 


8.3. 


8.4. 


8.5. 


Determine the Galois groups of the following polynomials over Q: 

(a) x? —2, (b)x°43x414, (ce) x9 —3x2741, (dx? — 21x +7, 

(e) x2 +x2-2x-1, (x2 4x2 -—2xe 41. 

Determine the quadratic polynomial g(x) that appears in (16.8.2) explicitly, in terms of 
a, and the coefficients of f. 

Let K = Q(q), where ais a root of the polynomial x3 42x41, and let g(x) =x 4x41. 
Does g(x) have a root in K? 


Let a; be the roots of a cubic polynomial f(x) = x3 + px+q. Find a formula for a second 
root @ in terms of the elements a, 5, and the coefficients of f. 


Section9 Quartic Equations 


91. 


9.2. 


9.3 
9.4. 


9.5. 


9.6. 


9.7. 


9.8. 


9.9, 


Let K be a Galois extension of F whose Galois group is the symmetric group S4. Which 
integers occur as degrees of elements of K over F? 


With reference to Example 16.9.2(a), write the element a + a’ as a nested square root. 
What other nested square roots does K contain? 


~ Can V4 + J7 be written in the form /a + Vb, with rational numbers a and b? 


(a) Prove that the polynomial x* — 8x* + 11 is irreducible over Q in two ways: using the 
methods of Chapter 12 and computing with its roots. 

(b) Do the same for the polynomial x* — 8x? + 9, 

(c) Determine all intermediate fields when K is the splitting field of x4 — 8x? + 11 
over Q. 


Consider a nested square root a = /r+ ./t with r and ¢ in a field F. Assume that @ has 
degree 4 over F, let f be the irreducible polynomial of a@ over F, and let K be a splitting 
field of f over F. 


(a) Compute the irreducible polynomial f(x) for a over F. Prove that G(K/F) is one 
of the groups D4, C4, or Do. 
(b) Explain how to determine the Galois group in terms of the element r? — f. 


(c) Assume that the Galois group of K/F is the dihedral group Dy. Determine 
generators for all intermediate fields FC LC K. 


Compute the discriminant of the quartic polynomial x* + 1, and determine its Galois 
group over Q. 

Assume that an extension field K /F has the form K = F(./a, Vb). Determine all nested 
square roots /r + /f that are in K, with y and tin F. 


Determine whether or not the following nested radicals can be written in terms of 
unnested square roots, and if so, find an expression. 


(a) J2+ V1, (b) V10 + 5V2, (c) V11 + 6V2, @) V6 + VI, (e) V114 v6. 
(a) Determine the discriminant and the resolvent cubic of a polynomial of the form 
fi =axttrxts. 
(b) Determine the Galois groups of x4 + 8x + 12 and x4 + 8x — 12 over Q. 
(c) Can the roots of the polynomial x4 + x — 5 be constructed by ruler and compass? 
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9.10. (a) What are the possible Galois groups of an irreducible quartic polynomial over Q that 
has exactly two real roots? 


(b) What are the possible Galois groups over Q of an irreducible quartic polynomial 
fC) whose discriminant is negative? 


9.11. Let F = Q, and let K be the splitting field of the polynomial f(x) = x4 — 2 over F. The 
roots are a, -a, ia, -ia, witha = 2: 
(a) Determine the Galois group G = G(K/F), and the subgroup H = G(K/F(i)). 
(b) Show how each element of H permutes the roots of f. 
(c) Find all intermediate fields. 


9.12. Determine the Galois groups of the following polynomials over Q. 
(a) x4 +4x2 +2, (b) x4 +2x7 +4, (oc) x4 +1, 
(dd) x4+x41, (0) x44 x8 4x27 4-041, (xt x24-1. 


9.13. Let K be the splitting field over Q of the polynomial x* — 2x* — 1. Determine the Galois 
group G of K/Q, find all intermediate fields, and match them up with the subgroups 
of G. 


*9,14, Let F = Q(w), where w = e?”'/3, Determine the Galois group over F of the splitting 


field of (a) /2+ V2, (b)V2+ 32. 


*9,15. Let K be the splitting field of an irreducible quartic polynomial f(x) over F, and let 
the roots of f(x) in K be a1, a2, a3, a4. Assume that the resolvent cubic g(x) has a 
root 6; = a,a2 + a3a4 in F. Express the root a; explicitly in terms of nested square 
roots. 


9.16. Determine the resolvent cubic of the general quartic polynomial (16.9.4). 


9.17. Determine the real numbers @ of degree 4 over Q that can be constructed with ruler and 
compass, in terms of the Galois groups of their irreducible polynomials. 


9.18. Prove that any Galois extension whose Galois group is the dihedral group Dg is the 
splitting field of a polynomial of the form x4 + bx* +c. 


Section 10 Roots of Unity 
10.1. Determine the degree of ¢7 over the field Q(¢3). 
10.2. Let ¢ = ¢17. Find generators for the intermediate field L2 described in Example 16.10.3. 
10.3. Let ¢ = ¢7. Determine the degree of the following elements over Q. 
E+E, HE+S, OE+O +e. 
10.4. Let ¢ = (13. Determine the degrees of the following elements over Q. 
CHE? , DTHE, EHO HS, MEHO HS, OPPO +O + sy, 
OQee ae He heer Soe eh eer: 
10.5. Let K = Q(¢p). Determine explicitly all intermediate fields when 
(a) p=5, (b) D=7, () p=11, @ p=13. 
10.6. (a) Carry out the proof of Theorem 16.10.12. 
(b) Prove the Kronecker-Weber Theorem for quadratic extensions. 
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10.7. 


10.8. 
10.9. 


10.10. 


Let Cn, = e?/" and let K = Q(én). 


(a) Prove that K is a Galois extension of Q. 

(b) Define an injective homomorphism G(K/Q) > U to the group U of units in the 
ring Z/(n). 

(c) Prove that this homomorphism is bijective when n = 6, 8, 12. (In fact, this map is 
always bijective.) 


Determine the Galois groups of the polynomials x8 —1, x!2—1, x9 -1. 

Let f(x) = (x — a1)--- (X% — an). 

(a) Prove that the discriminant of f is+f’(a)--- f’(a@n), where /’ is the derivative of f, 
and determine the sign. 

(b) Use the formula to compute the discriminant of the polynomial x? — 1, and use it to 
give another proof of Theorem 16.10.12. 


With regard to the eigenvector y described at the end of Section 16.11, show that at least 
one of the elements y; = a) + Chaz +--+. + C0P- Dia, isn’t zero. 


Section 11 Kummer Extensions 


11.1. 


11.2. 


*11.3. 


11.4. 
11.5. 


Prove that if the discriminant of an irreducible cubic polynomial in F[-x] is not a square 
in F, then the roots cannot be obtained by adoining a cube root to F. 


(a) Prove Proposition 16.11.2 without using Galois theory. 
(b) With F arbitrary, prove if x? — a is reducible in F[x], then it has a root in F. 


Let F be a subfield of C that contains i, and let K be a Galois extension of F whose 
group is C4. Is it true that K has the form F(a), with a4 in F? 


Carry out the computation to arrive at Cardano’s formula (16.13.3). 


(a) How BOs Cardene: s formula (16.13.3) express the roots of the polynomials x? + 
3x, x94 2, x3 —3x +2 and x9 — 3x42? 


(b) What are the correct choices of roots in Cardano’s formula? 


Section 12 Quintic Equations 


12.1. 
12.2. 
12.3. 


12.4. 


Is every Galois extension of degree 10 solvable? 
Determine the transitive subgroups of Ss. 


Let G be the Galois group of an irreducible quintic polynomial. Show that if G contains 
an element of order 3, then G is either S5 or As. 


Let s1,..., 5, be the elementary symmetric functions in variables u;,..., Un, andlet F 
be a field. 


(a) Prove that the field F(z) of rational functions in u;,..., U, is a Galois extension of 
the field F(ss, ..., Sn), and that its Galois group is the symmetric group Sp. 

(b) Suppose that n = 5, and let w = u,u2 4+u2U34+ U3u4 + UgUs + U5u1. Determine the 
Galois group of F(u) over the field F(s, w). 

(c) Let G bea finite group. Prove that there exists a field F and a Galois extension K 
of F whose Galois group is G. 
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12.5. Let K be a Galois extension of Q whose degree is a power of 2, and such that K CR. 
Prove that the elements of K can be constructed by ruler and compass. 


12.6. Prove that if the Galois group of a polynomial f is a nonabelian simple group, then the 
roots are not solvable. 


12.7. Find a polynomial of degree 7 over Q whose Galois group is S7. 


12.8. Let p be a prime. Prove that the symmetric group Sp is generated by any p-cycle together 
with any transposition. 


Miscellaneous Problems 


M.1. Let F; C F> be a field extension, and let f be a polynomial with coefficients in Fy). A 
splitting field Kz of f over F> will contain a splitting field K, of f over F,. What is the 
relation between the Galois groups G(K,/F)) and G(K2/ F)? 


M.2. Let L/F and K/L be Galois extensions. Is K /F necessarily a Galois extension? 
M.3. (Vandermonde determinant) 


(a) Prove that the determinant of the matrix 


1 uy uy? --- ee 
1 uo ae 
1 Un + at, ue 


is a constant multiple of the square root of the discriminant 5(u) = Tic ji — Uj). 
(b) Determine the constant. 


M.4. (a) The non-negative real numbers are those having a real square root. Use this fact to 
prove that the field R has no automorphism except the identity. 
*(b) Prove that C has no continuous automorphisms other than complex conjugation and 
the identity. 


M.5. Let K = F,, where g = p’. 


(a) Prove that the Frobenius map g defined by g(x) = x? is an automorphism of 
F —, Fp. 

(b) Prove that the Galois group G(K/F) is a cyclic group of order r that is generated 
by the Frobenius map ¢. 

(c) Prove that the Main Theorem of Galois theory is true for the extension K/F. 

M.6. 'Let K be a subfield of C, and let G be its group of automorphisms. We can view G as 
acting on the point set K in the complex plane. The action will probably be discontinuous, 
but nevertheless, we can define an action on line segments [a@, 8] whose endpoints are in 
K by defining g[a, B] = [ga, gf]. Then G also acts on polygons whose vertices are in K. 


(a) Let K = Q(¢) where € is a primitive fifth root of 1. Find the G-orbit of the regular 
pentagon whose vertices are 1, €, CB, 

(b) Let @ be the side length of the pentagon of (a). Show that a” isin K, and find the 
irreducible equation for a@ over Q. 


lin memory of Bruce Renshaw. 
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*M.7. 


*ML8. 


*M.9. 


*M.10. 


*M.11, 


*M.12. 


*M.13. 


A polynomial f in F[uj,..., un} is 1 symmetric if f(Ug|,...Uon) = f(u1,..-, Un) 
for every even permutation o, and skew-symmetric if f(uo1,..-,Uon) = (sign o) 
f(uz,..., un) for every permutation o. 


(a) Prove that the square root of the discriminant 6 = [];_ ;(uj —u j) is skew-symmetric. 


(b) Prove that every }-symmetric polynomial has the form f + gd, where f, g are 
symmetric polynomials. 


2With variables uo, 41, U2, U3, let pj = (uj — Uj4.1) (Uz —Ui42) (Ui41 — Ui 42), indices read 
modulo 4. Determine 


(a) 32, “+, () 3 ae 
Hi=0 Pit? 1=0 pit’ 
Let f(t, x) be an irreducible polynomial in C[z, x] that is monic and cubic when regarded 


as a polynomial in x. Assume that for some fo, the polynomial f(to, x) has one simple 
root and one double root. Prove that the splitting field K of f(x) over C(t) has degree 6. 


Let K be a finite extension of a field F, and let f(x) be in K[x]. Prove that there is a 
nonzero polynomial g(x) in K[x] such that the product f(x) g(x) is in F[x]. 


Let f(x) bean irreducible quartic polynomial in F[x] and let a1, a2, a3, a4 beits roots 
in a splitting field K. Assume that the resolvent cubic has a root B = a,a@2 + a@3a@4 in F, 
but that the discriminant D is not a square in F. According to (16.9.9), the Galois group 
of K/F is either C4 or Dy. 


(a) Determine the subgroup AH of the group S4 of permutations of the roots a@;, which 
stabilizes 8 explicitly. Don’t forget to prove that no permutations other than those 
you list fix B. 

(b) Let y = aja2 — a304 and € = ay + @2 — a3 — a4. Prove that y* and € are in F. 

(c) Let 5 be the square root of the discriminant. Prove that if y:+0, then dy is a square 
in F if and only if G = C4. Similarly, prove that if € #0, then de is a square in F if 
and only if G = C4. 

(d) Prove that y and € can’t both be zero. 


A finite group G is solvable if it contains a chain of subgroups G = Hyp C H} C--- C 
Ay, = {1} such that for every i = 1,...,k, Hj is a normal subgroup of Hj_;, and the 
quotient group H;/ Hj, is a cyclic group. Let f be an irreducible polynomial over a field 
F, and let G be its Galois group. Prove that the roots of f are solvable over F if and only 
if G is a solvable group. 


3Let K/F be a Galois extension with Galois group G. If we think of K as an F- 
vector space, we obtain a representation of G on K. Let x denote the character of this 
representation. Show that if F contains enough roots of unity, then x is the character of 
the regular representation. 


Wie weit diese Methoden reichen werden, muss erst 
die Zukunft zeigen. 


—Emmy Noether 


Suggested by Harold Stark. 
3Suggested by Galyna Dobrovolska. 


APPEND! X 


Background Material 


Historically speaking, it is of course quite untrue 

that mathematics is free from contradiction; 

non-contradiction appears as a goal to be achieved, 

not as a God-given quality that has been granted us once for all. 


—Nicolas Bourbaki 


A.1 ABOUT PROOFS 


What mathematicians consider an appropriate way to present a proof is not easy to make 
clear. One cannot give proofs that are complete in the sense that every step consists in 
applying a rule of logic to the previous step. Writing such a proof would take too long, and 
the main points wouldn’t be emphasized. On the other hand, all difficult steps of the proof 
are supposed to be included. Someone reading the proof should be able to fill in as many 
details as needed to understand it. How to write a proof is a skill that can be learned only by 
experience. 


Three general methods used to construct a proof are dichotomy, induction, and 
contradiction. 

The word dichotomy means division into two parts. It is used to subdivide a problem 
into smaller, more easily managed pieces. Other names for this procedure are case analysis 
and divide and conquer. 

Here is an example of dichotomy: By definition, the binomial coe fficient (/,) (read n 
choose k) is the number of subsets of order k in the set of indices {1, 2, ..., m}. For example, 
6) = 6. The set {1, 2, 3, 4} has six subsets of order 2. 


—1 —1 
Proposition A.1.1 For every integer n and every k < n, (;) = (’ ; + (( = i} 


Proof. Let S be a subset of {1, 2,...,m} of order k. Then either nis in S or mis not in S. 
This is our dichotomy. 


Case I: nis not in S. In this case, S is actually a subset of {1,2,...,n — 1}. By definition, 
there are (”;') of these subsets. 
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Case 2: nisin S. Let S’ = S — {n} be the set obtained by deleting the index n from the set S. 
Then S’ is a subset of {1, 2,...,m — 1}, cf order n — 1. There are (77 i) such sets S’. Hence 


there are (oh subsets of order k that contain n. 
This gives us ("7") + (2 “]) subsets of order & altogether. Oo 


The remarkable power of the method of dichotomy is shown here: In each of the two 
cases, n € S andn ¢ S, we have an additional fact about our set S. This additional fact can 
be used in the proof. 

Often a proof will require sorting through several possibilities, examining each in turn. 
This is dichotomy, or case analysis. It is analogous to the way Gray’s Manual of. Botany is used 
to determine the species of a plant. The procedure in Gray’s Manual leads through a sequence 
of dichotomies. A typical one is ‘‘leaves opposite,” or ‘“‘leaves alternate.” Classification of 
mathematical structures will also proceed through a sequence of dichotomies. They need not 
be spelled out formally in simple cases, but when one is dealing with a complicated range of 
possibilities, careful sorting is needed. 

Induction is the main method for proving a sequence of statements P,, indexed by 
positive integers n. To prove Py for all n, the principle of induction requires us to do two 
things: 


(A.1.2) 


(i) prove that P; is true, and 
(ii) prove that if, forsomeinteger k > 1, Py is true, then Px, is also true. 


Sometimes it is more convenient to prove that if, for some integer k > 0, Py_, is true, then 
P,, is true. This is just a change of the index. 


Here are some examples of induction. If n is a positive integer, then n! (‘‘n factorial’’) 
is the product 1 .2---n of the integers from 1 to n. Also, 0! is defined to be 1. 


igs n n! 
Proposition A.1.3 (7) = Mam _b’! 


Proof. Let P, be the statement that (7) = Men py for all € = 1,...,7. You will be able to 
check that P, is true. Assume that P,_; is true. Then the formula is true when we substitute 
n=r-—1and£&=k and is also true when we substituten =r—land@=k-—-1: 


Pal) =D! d r-1\_ G=1)! 
k J~k@-1-H) k-1)~ &-Dir-w! 


According to Proposition (A.1.1), 


r r-1 r-] (=)! (= 1)! 
() = (Ce) (i) mee ete 
Pee HD ke I r! 
k\(r —k)! kir—k)!  ki—k)! 
This shows that P; is true. O 
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As another example, let us prove the “pigeonhole principle.” Here |S| denotes the 
order, the number of elements, of a set S. 


Proposition A.1.4 If g: S — T is an injective map between finite sets, then J contains at 
least as many elements as are in S: |S| < |7|. 


Proof. We use induction on n = |S|. The assertion is true if = 0, that is, if Sis empty. We 
suppose that the theorem has been proved for n = k — 1, and we proceed to check it for 
n =k, where k > 0. We suppose that |.S| = k, and we choose an element s of S. Let t = g(s) 
be the image of s in 7. Since gis injective, s is the only element whose image is ¢. Therefore 
g maps the set S’ = S — {s} obtained by removing s injectively to the set T’ = T — {t}. 
Obviously, |.S’| = |S|~1 = k-—1 and |T’| = |7|—1. By the induction assumption, |.S"| < |T’], 
andso|S| < |7|. oO 


There is a variant of the principle of induction, called complete induction. Here again, 
we wish to prove a statement P, for each positive integer nm. The principle of complete 
induction asserts that it is enough to prove the following statement: 


If n is a positive integer, and if Py is true for every 
positive integer k <n, then Pp is true. 


When n = 1, there are no positive integers k < n. The hypothesis in the statement is 
automatically satisfied when m = 1. So a proof using complete induction must include a 
proof of P,. 

The principle of complete induction is used when there is a procedure to reduce P,, to 
P;, for some smaller integers k, but not necessarily to P,-;. Here is an example: 


Theorem A.1.5 Every integer n greater than 1 is a product of prime integers. 


Proof. Let Py, be the statement that n is a product of primes. We assume that P, is true 
for all kK <n, and we must prove that P, is true, ie., that m is a product of primes. If 
n is prime itself, then it is the product of one prime. Otherwise, n can be written as a 
product n = ab of positive integers neither of which is equal to 1. Then a and b are 
less than n, so the induction hypothesis tells us that P, and Py, are both true, that is, a 
and b are products of primes. Putting these products side by side gives us the required 
factorization of n. Oo 


Proofs by contradiction proceed by assuming that the desired conclusion is false and 
deriving a contradiction from this assumption. The conclusion must therefore be true. Such 
proofs are often fakes, in the sense that the argument by contradiction is easily turned into a 
direct proof. Here is an example: 


Proposition A.1.6 Let g: S — T be an injective map between finite sets. If g is bijective, 
then |S| = |T\. 


Proof. Since we are given that ¢ is injective, ¢g will be bijective if and only if it is surjective. 
We assume that |S| = |T|, but that ¢g is not surjective. Then there is an element ¢ in T, which 
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is not in the image of S. This being so, g actually maps S injectively to the set 7’ = T — {t}. 
Then Proposition A.1.4 tells us that |S| < |7’| = |7| — 1 and this contradicts |$|=|7|. O 


Try not to arrange proofs this way. The assumption made in the proof that | S| = |7| is 
irrelevant. Put positively, the argument shows that if an injective map ¢ isn’t bijective, then 
|S| <{T|. 

If X stands for some statement, we let not X stand for the statement that X is false. 
The assertion “if not B, then not A” is the contrapositive of the assertion “if A, then B,” 
and is logically equivalent with it. The argument presented above proves the contrapositive 
of the assertion of the proposition. 

It isn’t easy to find very simple examples of good proofs by contradiction, but there are 
some in the text. 


A.2. THE INTEGERS 


We learn elementary properties of addition and multiplication of integers in elementary 
school, but let us look again, to see what would be required in order to prove some of 
the properties, such as the associative and distributive laws. Complete proofs require a fair 
amount of writing, and we will only make a start here. It is customary to begin by defining 
addition and multiplication for positive integers. Negative numbers are introduced later. 
This means that several cases have to be treated as one goes along, which is boring, or else a 
clever notation has to be found to avoid such a case analysis. We will content ourselves with 
a description of the operations on positive integers. Positive integers are also called natural 
numbers. 
The set N of natural numbers is characterized by these properties: 


Peano’s Axioms 


e The set N contains a particular element 1. 

e Successor function: There is a map o: N > N that sends an integer to another 
integer, called the successor or next integer. This map is injective, and for every n in 
N, o(n) 1. 

e Induction axiom: Suppose that a subset S of N has these properties: 


(i) 1 is an element of S, and 
(ii) ifn isin S, then o(m) isin S. 


Then S contains every natural number: S$ = N. 


The successor o({n) will turn into n + 1 when addition is defined. At this stage the notation 
n+ 1 could be confusing. It is better to use a neutral notation, and we will denote the 
successor by n’ for now. The successor function allows us to use the natural numbers for 
counting, which is the basis of arithmetic. 

The induction property can be described intuitively by saying that the natural numbers 
are obtained from 1 by repeatedly taking the next integer: 
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In other words, counting runs through all natural numbers. This property is the basis of 
induction proofs. 

Peano’s axioms can also be used to make recursive definitions. The phrases recursive 
definition, or inductive definition, refer to the definition of a sequence of objects C, indexed 
by the natural numbers, in which each object is defined in terms of the preceding one. For 
instance, a recursive definition of the function x” is 


x'=x and x” =x"x. 


The important points are: 
(A.2.1) C; is defined, and a rule is given for determining C,/(= C,.1) from Cy. 


It is intuitively clear that these properties determine the sequence C,, uniquely, though to 
give a quick proof of this fact from Peano’s axioms isn’t easy. We won’t carry the proof out. 

Given the set of positive integers and the ability to make recursive definitions, we can 
define addition and multiplication of positive integers as follows: 


(A.2.2) Addition: m+l=m' and m+n'=(m+ny). 
_ Multiplication: m-1=m and m-n’=m-n+m. 

In these definitions, we take an arbitrary integer m and define addition and multiplication 
for that integer m and for every 7 recursively. In this way, m + n and m -n are defined for 
all m and n. 

The proofs of the associative, commutative, and distributive laws for the integers are 
exercises in induction that might be called ‘‘Peano playing.” We will carry out one of the 
verifications here as a sample. 


Proof of the associative law for addition. We are to prove that for all a, b, and n in N, 
(a+b)+n=a+(b+n). We first check the case n = 1 for all a and b. Three applications 
of the definition give 


(a+b)+1= (a+b) =a+b'=a+ (b+). 


Next, assume the associative law true for a particular value of n and for all a, b. Then we 
verify it for n’ as follows: 


(a+b)+n’ =(a+b)+(n+4+1) (definition) 
= ((a+b)4+n)+1 (casen = 1) 
=(a+(b+n))+4+1 (induction hypothesis) 
=a+((b+n) +1) (case n = 1) 
=a+(b+(n+1)) (case n = 1) 
=at+(b+n’) (definition). oO 


The proofs of other properties of addition and multiplication follow similar lines. 
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A.3 ZORN’S LEMMA 


At a few places in the text, we refer to Zorn’s Lemma, a tool for handling infinite sets. We 
now describe it. 


e A partial ordering of a set S is arelation s < s’, which may hold between certain elements 
and which satisfies the following axioms for all s, s’, s” in S: 


(A.3.1) 


(i) s<s; 
(ii) ifs < s’ ands’ <s”,thens <5”; 
(iii) ifs <s’ and s’ < s,thens=s". 


A partial ordering is called a total ordering if, in addition, 


(iv) for all s,s’ in S,eithers < s’ors’ <s. 


For example, let S be a set whose elements are sets. If A, B are in S, we may define 
A < Bif Aisa subset of B: A C B. This is a partial ordering on S, called the ordering by 
inclusion. Whether or not it is a total ordering depends on the particular case. 


An element m of a partially ordered set S is a maximal element if there is no element s 
in S with m < s, except for m itself. A partially ordered set S may contain many different 
maximal elements. For example, a subset V of a set U is a proper subset if V is neither the 
empty set, nor the whole set U. The set of all proper subsets of the set {1, ..., mn}, ordered 
by inclusion, contains n maximal elements, one of which is {2, 3, 4, ..., n}. 

A nonempty finite partially ordered set S contains at least one maximal element, but 
an infinite partially ordered set, such as the set of integers, may contain no maximal element 
at all. A totally ordered set contains at most one maximal element. 


¢ If A isa subset ofa partially ordered set S, then an upper bound for A is anelement bin S 
such that for all ain A,a < b. A partially ordered set S is inductive if every totally ordered 
subset 7 of S has an upper bound. 


A finite totally ordered set contains a unique maximal.element, and is inductive. 


Lemma A.3.2 Zorn’s Lemma. An inductive partially ordered set S has at least one maximal 
element. 


Zorn’s Lemma is equivalent with the axiomo fchoice, which is known to be independent 
of the basic axioms of set theory. We won’t enter into a further discussion of this equivalence, 
but we will show how Zorn’s Lemma can be used to show that every vector space has 
a basis. 


Proposition A.3.3| Every vector space V over a field F has a basis. 


Proof. Let S be the set whose elements are the linearly independent subsets of V, partially 
ordered by inclusion. We show that S is inductive: Let T be a totally ordered subset of S. 
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Then we claim that the union of the sets making up T is also linearly independent. This will 
show that it is in S. To verify this, let 


B=|(JA 
AeT 


be the union. By definition, a relation of linear dependence on Bis finite, so it can be written 
in the form 


(A.3.4) Cpu t+: +CpUn = 0, 


with v; in B. Since Bis a union of the setsin 7, each v; is contained in one of these subsets, 
call it A;. The collection {A;,..., An} of these subsets is a finite, totally ordered subset of 
T. It has a unique maximal element A. Then v; is in A for everyi =1,..., n. But since A is 
in S, it is a linearly independent set. Therefore (A.3.4) is the trivial relation. This shows that 
B is linearly independent, hence that it is anelement of S. 


We have verified the hypothesis of Zorn’s Lemma. So S contains a maximal element 
M, and we claim that M is a basis. By definition of S, M is linearly independent. Let 
W = Span (M). If W < V, then we choose an element vin V, which is not in W. The set 
M U {v} will be linearly independent. This contradicts the maximality of M and shows that 
W = V, hence that M is a basis. 0 


A similar argument proves Theorem (11.9.2) of Chapter 11: 
Proposition A.3.5 Let R bearing. Every ideal /# Riscontained ina maximalideal. O 


A.4 THE IMPLICIT FUNCTION THEOREM 


The Implicit Function Theorem for complex polynomial functions is used a few times in 
this book, and for lack of a reference, we derive it here from the theorem for real valued 
functions that we state below. The theorem for real valued functions can be found in [Rudin], 
Theorem 9.27. 


Theorem A.4.1 Implicit Function Theorem. Let /; (x, y),..., f-(%, y) be functions ofn+r 
real variables x1,..., Xm, ¥) ---, Yr, Which have continuous partial derivatives in an open 
set of Rt’ containing the point (a, b). Assume that the Jacobian determinant 

afi afi 

dy 1 : Oy, 

det : : 

af ah 

dy1 Oy 
is not zero at the point (a, b). There is a neighborhood U of the point a in R” such that 
there are unique continuously differentiable functions Y;(x),..., Y-(x) on U satisfying 


fi, YX) =0 for i=1,:--,r, and Y(a)=b. 0 
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The partial derivatives of a complex polynomial f(x, y) are defined using the rules 
of calculus, But we can also write everything in terms of the real and imaginary parts, say 
x = Xo + x11, y = Yo+ yl, where xo, X1, Yo, yi are real variables, and f = fo + fii, 
where f; = f;(%o, x1, Yo, 1) is a real-valued function of the four real variables. Since f is a 
polynomial in x and y, the real functions jf; are polynomials in the real variables x; and y;. 
So they have continuous partial derivatives. 


Lemma A.4.2 Let f(x, y) be a polynomial in two variables with complex coefficients. Then 
with notation as above, 


0 re) 
(a) of afo + sai and 
ayo 
e] 0 ( J 
(b) Paine equations) ofo = oft and _2fo er 
d¥o Oy ay AY 


Proof. One can use the product rule to verify these formulas. Suppose that f = gh. Then 
fo = goho — g1h, and f; = goh; + g1ho. If the formulas are true for g and h, they follow 
for f. So it is enough to verify the lemma for the functions f = y and f = x, for which they 
are obvious. O 


Theorem A.43 Implicit Function Theorem for Complex Polynomials. Let AS: y) be a 
complex polynomial. Suppose that for some (a, b) in C?, f(a, b) = 0 and 4 2 (a, b) +0. 
There is a neighborhood U of x in C on which a unique continuous function Y (x) exists 


having the properties 
f(x, Y(x))=0 and Y(a)= 


Proof. We reduce the theorem to the real Implicit Function Theorem A.4.1. The same 
argument will apply when there are more variables. 


With notation as above, we are to solve the pair of equations fp = f, = 0 for yo and 
y 1 as functions of x9 and x1. To do this, we show that the Jacobian determinant 


ayo OY 
det 4 4 
afi of 
ayo Oy 


is not zero at (a, b). By hypothesis, f;(ao, a1, bo, by) = 0. Also, since af 5 (4, b) 40, Lemma 


A.4.2(a) tells us that 2 = dy and 24 = 
that the Jacobian determinant is 


Ay? addy | 32 2 
det Be | =a +da°>0. 


d,, are not both zero. Part (b) of the lemma shows 


This shows that the hypotheses of the Implicit Function Theorem (A.4.1) are satisfied. O 


Exercises 


EXERCISES 


Section A.1 About Proofs 


A.1. Use induction to find a closed form for each of the following expressions. 


(a) 1+34+5+---+(2n+1) 
(b) 174+ 2243%4...4n? 


A.2. Prove that 13 +23 +--- +n? = (n(n 4+1))?/4. 
A.3. Prove that 1/(1-2) +1/(2-3) +---+1/((n41)) =n/( 4D). 


A.4. Let g: S — T bea surjective map between finite sets. Prove by induction that |.S| > 


and that if |S| = |7|, then ¢ is bijective. 
A.5. Let n be a positive integer. Show that if 2” — 1 is a prime number, then n is prime. 
A.6, Let a, = 22" +1. Prove that ay = aod, ...apn_1 +2. 
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A.7. A nonconstant polynomial with rational coefficients is called irreducible if it is not 
a product of two nonconstant polynomials with rational coefficients. Prove that ev- 
ery polynomial with rational coefficients can be written as a product of irreducible 


polynomials. 


Section A.2. The Integers 


A.8. Prove that every natural number 1 except 1 has the form m’ for some natural 


number m. 
A.9. Prove the following laws for the natural numbers. 


(a) the commutative law for addition, 

(b) the associative law for multiplication, 

(c) the distributive law, 

(d) the cancellation law for addition: ifa+b=a+c,thenb=c. 


A.10. The relation < on Ncan be defined by the rule a < bif b= a+n for some n. Assume 


that properties of addition have been proved. 


(a) Prove thatifa < b, thena+n <b+nforalln. 
(b) Prove that the relation < is transitive. 
(c) Prove that if a and b are natural numbers, then a < b, ora = b, orb < a. 


A.11. Assume that basic properties of the relation < on N are known (see Exercise A.10). Prove 
the principle of complete induction: A subset S of N is equal to N if it has the following 


property: If m is an element of N such that m is in S for every m < n, then n is in S. 
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Section A.3. Zorn’s Lemma 
A.12. Let S be a partially ordered set. 


(a) Prove that if S contains an upper bound Jb, then b is unique, and also D is a maximal 
element. 


(b) Prove that if S is totally ordered, then a maximal element m is an upper bound for S. 


A.13. Use Zorn’s Lemma to prove that every ideal / of aring R that is not R itself is contained 
in a maximal ideal. 


Section A.4 The Implicit Function Theorem 
A.14. Prove Lemma (A.4.2). 
A.15. Let f(x, y) be a complex polynomial. Assume that the equations 


have no common solution in C2. Prove that the locus f = 0 is a manifold of dimension 2. 
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Notation 


the class of the ideal A (13.7.2) 

the transpose of the matrix A (1.3.1) 

the alternating group (2.5.6) 

the field of complex numbers (2.2.2) 

the cyclic group of order n (6.4.1) 

the conjugacy class of the element x (7.2.3) 
the cofactor matrix of the matrix A (1.6.7) 
the dihedral group (6.4.1) 

the determinant of the matrix A (1.4.1) 


a standard basis vector (1.1.24), a matrix unit (1.1.21) 


the space of n-dimensional column vectors with entries in F’ (3.3.6) 


the space of m Xn matrices with coefficients in F (3.3.6) 

the field of integers modulo p (3.2.4) 

the general linear group (2.2.4) 

the identity matrix (1.1.11), the icosahedral group (6.12.1) 
the image of the map @ (2.5.4) 

the kernel of the homomorphism @ (2.5.5), (4.1.5) 

a fixed field (16.5.1) 

the space of bounded sequences (3.7.2) 

the group of isometries of the plane, of m-space (Section 6.2) 
the set of positive integers, also called natural numbers (A.2.1) 
the normalizer of the subgroup H (7.6.1) 

n factorial: the product of the integers 1, 2,..., 7. 

a binomial coefficient (A.1.1) 

the orthogonal group (6.7.3), (9.1.2) 

the Lorentz group (9.1.5) 

the projective group (9.8.1) 
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R the field of real numbers (2.2.2) 

Rt the additive group of R (2.1.1) 

R* the multiplicative group of invertible elements of R (2.1.1) 
Sh, the symmetric group (2.2.5) 

s” the n-dimensional sphere (Section 9.2) 


SLn the special linear group (2.2.11), (9.1.3) 

SOn the special orthogonal group (5.1.11), (9.1.3) 
S Pp, the symplectic group (9.1.4) 

SUn the special unitary group (9.1.3) 


T the tetrahedral group (6.12.1) 

U,, the unitary group (8.3.14), (9.1.3) 

<x> the subgroup generated by the element x (2.4.1) 

Z the center of a group (2.5.12) 

Z the ring of integers (2.2.2) 

Z(x) the centralizer of the element x (7.2.2) 

tn the nth root of unity e?”'/" (12.4.7) 

Lue] the largest integer < yw: the floor of 2 (13.7.7) 

w the cube root of unity e2”// (10.4.14) 

x indicates that two structures are isomorphic, as in GG’ (2.6.3) 


= congruence, as in a=b modulo n (2.9.1), see also (2.8.2), (2.7.14) 


: ; : oe >t 
* If A is a complex matrix, then A®* is the adjoint matrix A (8.3.5) 
In a matrix display, * denotes an undetermined entry. 
The starred exercises are some of the more difficult ones. 


@ direct sum (3.6.5), (14.7.2) 


If S and T are sets, we use the following notation: 


S| the number of elements, the order, of the set S$ 

[S] the subset S, when it is regarded as an element of a set of subsets 
(2.7.8) 

ses Sis an element of S. 

SCT S is a subset of 7, or S is contained in 7. In other words, every element 


of S is also an element of T. 


Notation 


T contains S, which is the same as S CT. 


S is a proper subset of 7, meaning that it is a subset, and 7 contains 
an element that is not a member of S. 


This is the same as S < T. 


the intersection of the sets: the set of all elements in common to S and 
T. 

the union of the sets: the set of all elements that are contained in at 
least one of the sets S or T. 

the product set. Its elements are ordered pairs (s, f), with s in S and ¢ 
in T. 

amap ¢ from S to T, a function whose domain is S and whose range 
is 7. 

This wiggly arrow indicates that the map under consideration sends 
the element s to the element f, i.e., that g(s) = ¢. 

This symbol indicates that a digression in the text, such as a proof or 
an example, has ended, and that the text returns to the main thread. 0 
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A 


Abelian groups, 40, 81, 412-13, 421 

finite, 431 

free, 225 

infinite, 41 

Structure Theorem for, 429-30 
Abstract symmetry, 176-78, 190-91 
Addition 

of matrices, 2 

of relations, 337-38 

vector, 78 
Adjoint matrix, 233 
Adjoint operator, 242 
Adjoint representation, 289 
Affine group, 288 
Algebraically closed field, 471 
Algebraic element, 443-46, 472 
Algebraic extension, 473 
Algebraic geometry, 347-53 
Algebraic integers, 383-85, 408 

factoring, 385-87 
Algebraic number, 383 
Algebraic number field, 442 
Algebraic variety, 347 
Alternating group, 49, 63 
Angle 

of rotation, 171 

between vectors, 242 
Antipodal point, 269 
Ascending chain condition, 426 
Associative law, 5, 68, 176 

for addition, 517 

for congruence classes, 61 

for scalar multiplication, 90 
Augmented matrix, 12 
Automorphism, 52, 176 

F-automorphism, 484 

inner, 193 


R-automorphism, 477 
of ring, 355 
Averaging, over a group, 294 


Axiom of choice, 98, 348, 518. See also 


Zorn’s Lemma 
Axis of rotation, 134 


B 


Basechange matrix, 93-94 
Base point, 468 
Bases, 86-91, 99-100 
change of, 93-95 
computing with, 90-91, 100 
defined, 88 
infinite, 98 
lattice, 169, 405 
of module, 415 
orthogonal, 252 
orthonormal, 133, 240, 252 
standard, 88, 415 
Berlekamp algorithms, 374, 382 
Bézout bound, 349 
Bilateral symmetry, 154 
Bilinear form, 229-60 
Euclidean space, 241-42 
Hermitian form, 232-35 
Hermitian space, 241-42 
orthogonality, 235-41 
skew-symmetric form, 249-52 
spectral theorem and, 242-45 
symmetric form, 231-32 
Binomial coefficient, 513 
Block multiplication, 8-9 
Branched covering, 351 
cut and paste, 465-68 
isomorphism of, 464 
Branch points, 351, 353 
Burnside’s formula, 194 
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Cc 


Cancellation law, 41-43, 82-83, 343, 392 
Canonical map, 66, 335, 423 
Cardano’s formula, 501 
Cartesian coordinates, 452 
Case analysis, 513 
Cauchy-Riemann equations, 520 
Cauchy’s Theorem, 375 
Cayley-Hamilton theorem, 140 
Cayley’s theorem, 195 
Celestial sphere, 264 
Center 
of group, 196 
of p-group, 197 
Center of gravity, 166 
Centroid, 166. See also Center of gravity 
Change of basis, 93-95 
Character, 291, 298-303 
dimension of, 299 
Hermitian product on, 299 
irreducible, 299 
one-dimensional, 303-4 
table, 302 
Characteristic polynomial, 113-16 
of linear operator, 115 
Characteristic subgroup, 225 
Characteristic zero, 83, 484 
Chinese Remainder Theorem, 73, 356,378 
Circle group, 262, 320 
Circulant, 258 
Class 
congruence, 60 
ideal, 388, 396-99, 410 
Class equation, 195-97 
of icosahedral group, 198-200 
Class function, 300 
Class group, 399-402, 410 
Class number, 396 
Closure in subgroups, 42-43 
Cofactor matrix, 29-31 
Column index, 1 
Column rank, 108 
Column space, 87, 104 
Column vector, 2 
Combination, linear, 7, 79, 86, 97 


Common zeros, 347 
Commutative law, 5-6 

for congruence classes, 61 
Commutative diagram, 105 
Commutator subgroup, 225 
Compact groups, 311 
Complete expansion, of determinants, 29 
Complete induction, 515, 521 
Complete of relations, 215, 424 
Complex algebraic group, 282 
Complex line, 347 
Complex representations, 293 
Congruence, 60 
Conics, 245-49 

degenerate, 245 

nondegenerate, 246 
Conjugacy class, 196 
Conjugate representation, 293 
Conjugate subgroups, 72, 178, 203 
Conjugation, 52, 195 

in symmetric group, 200-203 
Connected component, 76 
Constructible point, line, circle, 451-54 
Construction, ruler and compass, 450-55 
Continuity, proof by, 138-40 
Contradiction, proofs by, 515 
Coordinates, 90 

change of, 158-59 
Coordinate system, 159 
Coordinate vectors, 78, 93, 94, 105, 416 
Correspondence Theorem, 61-64, 336-37, 

414 

proof of , 63-64, 336 
Coset, 56-59 

double, 76 

left, 49, 56 

operation on, 178-80 

right, 58-59, 216 
Counting formula, 57, 58, 62, 180-81, 185 
Covering space, 351 
Cramer’s Rule, 415, 417 
Crystallographic group, 187 
Crystallographic restriction, 171-72 
Cubic, resolvent, 496 
Cubic equations, 492-93, 507-8 
Cubic extensions, 446 


Cusp, 351 

Cutand paste, 465-68 

Cycle notation, 24 

Cyclic group, 46-47, 163, 183, 208 
generator for, 84 
infinite, 47 
of order n, 46 

Cyclic R-module, 432 

Cyclotomic polynomial, 374 


D 


Defining relations, 212 
Degenerate conic, 245 
Degree 
of field extension, 446-49 
of a monomial, 327 
multiplicative property of, 447 
total, 327 
weighted, 482 


Determinant homomorphism, 49, 56, 62 


Determinant, 7, 18-24 
complete expansion of, 29 
formulas for, 27-31 


multiplicative property of, 21-24 


of permutation matrix, 27 

recursive definition of, 20 

of R-matrix, 414 

uniqueness of, 20-21 

Vandermonde, 511 
Diagonal entries, 6 
Diagonal form, 116-19 
Diagonalizable matrix, 117 
Diagonalizable operator, 119 
Diagonal matrix, 6 
Dichotomy, 513 
Differential equations, 141-45, 151 
Dihedral group, 163, 183, 316 
Dimension, 86—91 

of character, 299 

of vector space, 90 

of linear group, 262 
Dimension formula, 102-4 
Direct sums, 95—96, 295 

of modules, 429 

of submodules, 430 


Index 


Discrete group, 167-72 

Discrete subgroup, 168 

Discriminant, 481-83 

Distinct, 17 

Distributive law, 5, 81, 324 
for congruence classes, 61 
for matrix multiplication, 147 
for vector spaces, 84 

Divide and conquer, 513 

Divisor 
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greatest common, 44—45, 334, 359, 


362 
zero, 343 
Domain 
Euclidean, 361, 376 
factorization, 360-67, 379, 400 
integral, 343 
principal ideal, 361 
unique factorization, 364 
Dot product, 132, 229 
Double coset, 76 


E 


Eigenspace, 126 

generalized, 131 
Eigenvalue, 111, 113, 114, 116, 234 
Eigenvectors, 110-13, 116, 124 

generalized, 120 

positive, 112 
Eisenstein criterion, 373-74 
Elementary integer matrix, 418 
Elementary matrix, 10-12, 77 
Elementary row operation, 10 
Elementary symmetric function, 478 
Elements 

adjoining, 338-41 

algebraic, 443-46 

inverse image of, 55 

irreducible, 444 

maximal, 518 

norm of, 386 

prime, 360 

primitive, 462-63 

relatively prime, 362 

representative, 55 


532 Index 


Elements (continued) 
solvable, 502 
stabilizer of, 177-78 
transcendental, 443-46 
zero, 417 
Ellipse, 246 
Ellipsoid, 248, 269 
Equation, 4 
Cauchy-Riemann, 520 
class, 195-97 
cubic, 492-93 
differential, 141—45 
homogeneous, 15, 88, 92 
quartic, 493-97 
quintic, 502-5 
Equator, 265, 267 
Equivalence relation, 52-56 
defined, 53 
defined by a map, 55-56 
reflexive, 53 
symmetric, 53 
transitive, 53 
Euclidean Algorithm, 45, 367 
Euclidean domain, 361, 376 
Euclidean space, 241-42 
standard, 241 
Euler’s theorem, 137-38 
Exceptional group, 283 
Expansion by minors, 19, 28 
on the ith row, 28 
Extension 
algebraic, 472 
cubic, 446 
field, 442 
finite, 446 
Galois, 485, 488-89 
Kummer, 500-502 
ring, 338 


F 
Factoring, 359-82 


algebraic integers, 385-87 


Gauss primes, 376-78 
Gauss’s lemma, 367-71 
ideals, 392—94, 409 


integer polynomials, 371-75, 380-81 
integers, 359, 378 
unique factorization domains, 360-67 
Factorization 
ideal, 391 
irreducible, 364, 365 
prime, 365 
Faithful operation, 182 
Faithful representation, 291 
F-automorphism, 484 
Fermat’s theorem, 99 
Fibonacci numbers, 152 
Field extension, 442 
algebraic, 486 
degree of, 446-49 
isomorphism of, 445, 484-86 
Fields, 80-84, 98-99, 442-76 
adjoining roots, 456-59 
algebraically closed, 471 
algebraic and transcendental elements, 
443-46 
characteristic of, 83 
finding irreducible polynomials, 449-50 
finite, 442, 459-62 
fixed, 486-88 
function, 442-43, 463-71 
intermediate, 488 
number, 442 
quadratic number, 383-411 
of rational functions, 344 
real quadratic, 402-5 
ruler and compass constructions, 
450-55 
splitting, 483-84 
tangent vector, 280 
Finite abelian group, 431 
Finite-dimensional vector space, 89 
dimension of, 90 
subspaces of, 95 
Finite extension, 446 
Finite field, 442, 459-62 
order of, 459 
Finite group, 41 
homomorphism of, 58 
of orthogonal operators on plane, 
163-67 


Finitely generated module, 415 
Finite simple group, 283 
Finite subgroups of rotation group, 183-87 
First Isomorphism Theorem, 68-69, 215, 
335, 414, 432, 492 
Fixed field, 486-88 
Fixed Field Theorem, 487-88 
Fixed point theorem, 166, 198 
Fixed vector, 111 
Form 
Hermitian, 232-35 
Killing, 289 
Lorentz, 231 
matrix of, 230 
nondegenerate, 236, 252 
quadratic, 246 
rational canonical, 435 
signature of, 240 
skew-symmetric, 230, 249-52 
symmetric, 230 
Fourier matrix, 260 
Fractions, 342-44 
Free abelian group, 225 
Free group, 210-11 
mapping property of, 214 
Free modules, 412, 437 
submodules of, 421-23 
Frobenius map, 355, 511 
Frobenius reciprocity, 321 
Function field, 442-43, 463-71 
cut and paste, 465-68 
Functions 
rational, 487 
successor, 516 
symmetric, 477-81 
Fundamental domain, 193 
Fundamental Theorem 
of Algebra, 471 
of Arithmetic, 359, 363 


G 


Galois extension, 485, 488-89 
characteristic properties of, 488-89 
Galois group, 485 
of a polynomial, 489 
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Galois theory, 477-512 
for a cubic, 493 
cubic equations, 492-93 
discriminant, 481-83 
fixed fields, 486-88 
isomorphisms and field extensions, 
484-86 
Kummer extensions, 500-502 
Main Theorem, 489-92 
quartic equations, 493-97 
quintic equations, 502-5 
roots of unity, 497—500 
splitting fields, 483-84 
symmetric functions and, 477-81 
Gauss integer, 323, 386 
Gauss prime, 376-78, 394 
Gauss’s lemma, 367-71 
Generalized eigenspace, 131 
Generalized eigenvector, 120 
General linear group, 8, 41 
integer, 418 
over R, 414 
Generators, 212-16, 225-26, 423-26, 438 
Jordan, 122 
of a module, 415 
Geometry, algebraic, 347-53, 357-58 
Glide reflection. 160 
Glide symmetry, 155 
Gram-Schmidt procedure, 241 
Greatestcommon divisor, 44, 334, 359, 362 
Group homomorphism, 48 
Group operation, 176-78 
Group representation, 290-322 
Groups, 37-77 
abelian, 40, 81, 412-13, 421 
affine, 288 
alternating, 49, 63 
averaging over, 294 
center of, 50, 196 
circle, 262 
compact, 311 
complex algebraic, 282 
correspondence theorem, 61-64 
cosets, 56-59 
crystallographic, 187 
cyclic, 46-47, 64, 163, 183 
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Groups (continued) 


defined, 40 

defining relations for, 42 

dihedral, 163, 183 

discrete, 167-72 

equivalence relations and partitions, 
52-56 

exceptional, 283 

finite, 41, 163—67 

finite simple, 283 

free, 210~11 

free abelian, 225 

Galois, 485 

general linear, 41 

homomorphisms, 47-51 

homophonic, 77 

icosahedral, 183 

infinite, 41 

isomorphic, 51 

isomorphism of, 51-52 

laws of composition, 37-40 

linear, 261-89 

Lorentz, 262 

Mathieu, 283 

modular arithmetic, 60-61 

multiplicative, 84 

nonabelian, 222 

octahedral, 183 

one-parameter, 272-75 

operation of, 293 

opposite, 70 

order of, 40 

orthogonal, 134, 261 

p-groups, 197-98 

plane crystallographic, 172-76 

point, 170-71 

product group, 64-66 

protective, 280 

quotient, 66-69, 74-75 

representation of, 292 

rotation, 137, 269-72 

simple, 199 

special linear, 43, 50 

spin, 269 

sporadic, 283 

surjective, 62 


symmetric, 41, 50, 197 

symplectic, 261 

tetrahedral, 183 

translation, 168-70 

translation in, 277-80 

triangle, 226 

two-dimensional crystallographic, 
172 

unitary, 235, 261 


H 


Half integer, 384 

Half space, 259 

Hausdorff space, 351 

Hermitian form, 232~—35, 254 
standard, 232 

Hermitian matrix, 233 

Hermitian operator, 257 


* Hermitian product, 299 


Hermitian space, 241-42, 256 
standard, 241 
Hermitian symmetry, 233 
Hilbert Basis Theorem, 428—29 
Hilbert Nullstellensatz, 345 
Homeomorphism, 262 
Homogeneity in a group, 277 
Homogeneous linear equation, 15, 
88, 92 
Homogeneous polynomial, 328 
Homomorphism, 47-51, 158 
determinant, 49, 56, 62 
group, 48 
image of, 48-49 
kernel of, 49, 56, 62, 69, 331, 
413 
restriction of, 61 
of modules, 413 
of rings, 328-34 
of R-modules, 427 
spin, 269 
trivial, 48 
Homophonic group, 77 
Hyperbola, 246 
Hyperplane, 259 
Hypervector, 86 


I 
icosahedral group, 183 
class equation of, 198-200 
Ideal, 331, 387 
factorization, 391-94 
generated by a set, 332 
of leading coefficients, 428 
maximal, 344-47, 394 
prime, 392, 394-96 
principal, 331 
product, 355, 390 
proper, 331 
unit, 331 
zero, 331 
Ideal class, 388, 396-99 
Ideal multiplication, 389-92 
Idempotent, 341 
Identities, 5, 417-18 
Newton, 505 
Identity element, 42 
Identity matrix, 6 
Image, of homomorphism, 413 
Imaginary quadratic number field, 
383 
Implicit Function Theorem, 522 
Inclusion, ordering by, 518 
Inclusion map, 48 
Indefinite form, 231 
Independence, 87, 95, 97, 415 
Independent subspaces, 95 
Index, multiplicative property of, 58 
Induced law, 42 
Induced representation, 321 
Induction, 513-516 
Inductive definition, 517 
Inductive set, 518 
Infinite basis, 98 
Infinite cyclic group, 47 
Infinite-dimensional space, 96-98 
Infinite group, 41 
Infinite order, 47 
Infinite set, span of, 97 
Inner automorphism, 193 
Integer general linear group, 418 
Integer matrix 
diagonalizing, 418-23 


Index 


elementary, 418 
invertible, 418 
Integer polynomials, factoring, 
371-75 
Integers, 390, 516-17 
algebraic, 383-85 
factoring, 378 
Gauss, 323, 386 
half, 384 
modulo, 66 
next, 516 | 
norm of, 397 
prime, 64, 394-96 
ring of, 384 
square-free, 384 
subgroups of additive group of, 
43 
successor, 516 
Integral domain, 343 
Intermediate field, 488 
Intersection, 527 
Invariant 
form, 297 
operator, 307 
subspace, 110, 294 
vector, 294 
Inverse, 7, 40 
Inverse image, 55 
left, right, 7 
Invertible integer matrix, 418 
Invertible matrix, 7, 15 
Invertible operator, 109 
Irreducible character, 299 
Irreducible element, 444 
Irreducible factorization, 364 
Irreducible polynomial, 350, 383, 
443, 458 
finding, 449-50 
Irreducible representation, 294-96 
Isometrix, 156-59 
discrete group of, 167-72 
fixed point of, 162 
orientation-preserving, 160 
orientation-reversing, 160 
of the plane, 159-63 
Isomorphic groups, 51 
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Isomor phism, 51-52 

of branched coverings, 464 

of field extensions, 445, 464, 484-86 

of groups, 51-52 

modules and, 413 

of representations, 293, 307 

of rings, 328 

of vector spaces, 85, 91 
Isomorphism class of a group, 52 


J 


Jacobi identity, 276 
Jordan block, 121, 148 
Jordan form, 120-25, 148 
Jordan generators, 122 


K 


Kaleidoscope principle, 167 
Kernel 
of homomorphism, 49, 56, 62, 413 
of ring homomorphism, 331 
Killing form, 289 
Klein Four Group, 47, 65, 490, 493, 503 
Kronecker delta, 133 
Kronecker-Weber Theorem, 500 
Kummer extensions, SO0—502 


L 


Lagrange interpolation formula, 17, 
380 

Lagrange’s theorem, 57 

Latitude, 265-66 

Lattice, 403, 405-8 

Lattice basis, 169, 405 

Laurent polynomials, 356 

Law of composition, 37-40 
associative, 37 
commutative, 38 
identity for, 39 

Law of cosines, 242 

Leading coefficients, 325 
ideal of, 428 

Left coset, 49, 56 

Left multiplication, 195, 277-78 


by G, 177 
Left translation, 277 
Lie algebra, 275-77, 286 
Lie bracket, 276 
Linear algebra, in ring, 412-41 
free modules, 414-17 
generators and relations, 423-26 
linear operators and, 432-35 
modules, 412-14 
noetherian rings, 426—29 
polynomial rings in several variables, 
436 
structure of abelian groups, 429-32 
Linear combination, 9, 79, 86, 97 
Linear equation, homogeneous, 15, 88, 91 
Linear group, 261-89 
classical groups, 261-62 
dimension of, 262 
integer general, 418 
Lie algebra, 275-77 
normal subgroups of SL2, 280-83 
one-parameter groups, 272-75 
rotation group SO3, 269-72 
special unitary group SU2, 266-69 
spheres and, 263-66 
translation in group, 277-80 
Linear operator, 102-31, 293, 432-35 
applications of, 132-53 
characteristic polynomial of, 113-16, 
115 
defined, 108-10 
dimension formula, 102-4 
eigenvectors, 110-13 
Jordan form, 120-25 
left shift, right shift, 109 
triangular and diagonal form, 
116-19 
Linear relation, 103 
among vectors, 87 
Linear transformation, 102 
matrix of, 104-8 
Longitude, 265-66 
Lorentz form, 231 
Lorentz group, 262 
Lorentz transformation, 262 
Liiroth’s Theorem, 488 


M 


Main Lemma, 392 
Main Theorem of Galois theory, 
489-92 
Manifold, 278 
Mapping property 
of free groups, 214 
of quotient groups, 214 
of quotient modules, 413 
of quotient rings, 335, 343 
Maps 
canonical, 66, 335, 423 
equivalence relation defined by, 
55-56 
Frobenius, 355 
surjective, 54 
well defined, 180 
Zero, 328 
Maschke’s theorem, 296, 298 
Mathieu group, 283 
Matrix, 1-36 
addition of, 2 
adjoint, 233 
augmented, 12 
basechange, 94 
block multiplication, 8-9 
cofactor, 29-31 
determinant of, 7, 18-24 
diagonal, 6, 117, 146 
diagonal entries in, 6 
diagonalizable, 117. 124 
elementary, 10-12 
elementary integer, 418 
Fourier, 260 
Hermitian, 233 
identity, 6 
integer, 418-23 
invertible, 7, 15 
of linear transformation, 104-8 
multiplication of, 2-3, 78 
nonzero, 9 
normal, 242 
orthogonal, 132-38 
permutation, 24-27, 51 
of polynomials, 432 


Index 


positive, 112 
presentation, 423 
R-matrix, 414 
rotation, 108, 134 
row echelon, 13-15 
row reduction of, 10-17 
scalar multiplication of, 2 
self-adjoint, 233 
skew-Hermitian, 267 
square, 2, 8 
unitary, 235, 244-45 
upper triangular, 6 
zero, 6 
Matrix entries, 1 
Matrix exponential, 145-50, 
278 
Matrix multiplication, 2-4 
Matrix notation, 4, 86 
Matrix of form, 230 
Matrix of transformation, 105 
Matrix product, 3 
Matrix representation, 290 
Matrix transpose, 17-18 
Matrix units, 9-10 
Maximal element, 518 
Maximal ideal, 344-47, 394 
Minors, 19 
expansion by, 19 
Modular arithmetic, 60-61 
Modules, 412—14 
basis of, 415 
direct sum of, 429 
finitely generated, 415 
free, 412, 414-17 
generators of, 415 
homomorphism, 413 
isomorphism, 413 
rank of, 416 
of relations, 424 
R-module, 412 
Structure Theorem for, 
432-35 
Monic polynomial, 325, 340 
Monomial, 325, 327 
Multi-index, 327 
Multiple root, 458 
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Multiplication 

block, 8-9 

ideal, 389-92 

left, 177, 195, 277-78 

of matrices, 78 

matrix, 2-4 

right, 216 

scalar, 2, 5, 78, 90 

table, 38 
Multiplicative group, structure of, 84 
Multiplicative property 

of degree, 447 

of index, 58 

of the determinant, 21-24 
Multiplicative set, 357 


N 


Natural number, 516 
n-dimensional sphere (n—sphere), 
263 
Negative definite, 231 
Negative semidefinite, 231 
Newton’s identities, 505 
Nilpotent, 122, 127, 355 
Node, 351 
Noetherian ring, 426-29 
Nonabelian group, 222 
Noncommutative ring, 324 
Nondegeneracy on a subspace, 252 
Nondegenerate form, 236, 252 
Nonsingular point, 358 
Nonzero, 9 
Norm 
of an element, 386, 403 
of an ideal, 397 
Normalizer, 203 
Normal matrix, 242 
Normal subgroup, 66 
generated by a set, 212 
North pole, 263, 264 
Notation 
cycle, 24 
fraction, 40, 343-44 
matrix, 4, 86 
power, 40 


sigma, 4 

summation, 5, 28 
Nullity, 103 
Nullspace, 79, 103 
Null vector, 236, 252 
Number field, 442 

algebraic, 442 


O 


Octahedral group, 183 
One-dimensional character, 303-4 
One-parameter group, 272-75 
Operation 
on cosets, 178-80 
faithful, 182 
of a group, 176-78, 293 
partial, 217, 218 
on subsets, 181 
Operator 
adjoint, 242 
determinant of, 118 
diagonalizable, 117 
Hermitian, 244 
invertible, 109 
linear, 110, 293, 432-35 
normal, 242 
nilpotent, 122, 127 
orientation-preserving, 159 
orientation-reversing, 159 
orthogonal, 134, 162, 245 
self-adjoint, 243 
shift, 109, 434 
singular, 109 
symmetric, 245 
trace of, 118 
unitary, 242 
Opposite group, 70 
Orbit, 166, 177, 185 
Orbit sum, 477 
Order 
of finite field, 459 
of group, 40, 208-10 
by inclusion, 518 
partial, 518 
total, 518 


Ordered set, 86 
Orientation, 159 
Orientation-preserving isometry, 160 
Orientation-reversing isometry, 160 
Orthogonal basis, 252 
Orthogonal group, 134, 261 
Orthogonality, 235-41, 254-56 
Orthogonality relations, 300 

proof of, 309-11 
Orthogonal matrix, 132-38 
Orthogonal operator, 134, 245 
Orthogonal projection, 238-41 
Orthogonal representation, 269 
Orthogonal space, 236 

to a subspace, 252 
Orthogonal sum, 237 
Orthogonal vectors, 252 
Orthonormal basis, 133, 240 


P 


Parabola, 246 

Parallelogram law, 256 
for vector addition, 112 

Partial operation, 217, 220 

Partial ordering, 518 

Partition, 52-56, 57 

Peano’s axioms, 516-17 

Permutation matrix, 26, 51 
determinant of, 27 

Permutation representation, 181-83, 
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Permutation, 24-27, 41, 50, 201 
cycle notation, 24 
representation, 181-83, 192 
symmetric group, 24 
transposition, 25 

p-group, 197-98 

Pick’s Theorem, 411 

Plane algebraic curve, 350 


Plane crystallographic group, 172-76, 189-90 


Point group, 170-71 
Point, 163 
base, 468 
branch, 351, 353 
Polar decomposition, 259, 287 
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Pole, 184, 186 
north, 263, 264 
Polynomial ring, 325-28, 432-35 
in several variables, 436, 440 
Polynomial, 85, 327 
characteristic, 113-16, 197 
complex, 520 
constant, 325 
cyclotomic, 374 
discriminant of, 481-83 
homogeneous, 328 
integer, 380-81 


irreducible, 350, 383, 443, 449-50, 458 


Laurent, 356 

matrix of, 432 

monic, 325, 340 

paths of, 101 

primitive, 368, 371 

quadratic, 247 

quartic, 495 

ring, 325-328 

roots of, 116 

symmetric, 477 
Positive combination, 259 
Positive definite, 229, 231, 232, 234 
Positive eigenvector, 112 
Positive matrix, 112 
Power notation, 40 
Presentation matrix, 423 
Prime 

Gauss, 376-78, 381, 394 

ramified, 395 

split, 395 
Prime element, 360 
Prime factorization, 365 
Prime ideal, 392, 394-96 
Prime integer, 64, 394-96 
Primitive element, 462-63 
Primitive Element Theorem, 462-63 
Primitive polynomial, 368, 371 
Primitive root, 84 
Principal ideal, 331 
Principal ideal domain, 361 
Product group, 64-66, 74 
Product ideal, 355, 390 
Product matrix, 3 
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Product permutation, 24 
Product ring, 341-42 
Product rule, 142 
Product set, 67, 527 
Projection, 64 
orthogonal, 238-41 
stereographic, 263 
Projective group, 280 
Proper ideal, 331 
Proper subgroup, 43 
Proper subspace, 79 
Pythagoras’ theorem, 133 


Q 


Quadratic form, 246 
Quadratic number field, 383-411 
algebraic integer, 383-85 
class group, 396-99 
factoring algebraic integers, 385-87 
factoring ideals, 392-94 
ideal class, 396-99 
ideal multiplication, 389-92 
ideals, 387-89 
imaginary, 383 
lattices and, 405-8 
real, 402-5 
Quadric, 245—49 
Quartic equation, 493-97 
Quartic polynomial, 495 
Quaternion algebra, 266, 288 
Quaternion group H, 47 
Quintic equation, 502-5 
Quotient group, 66-69, 74-75 
mapping property of, 214-15 
Quotient ring, 334-38 
mapping property of, 335, 343 


R 


Ramified prime, 395 
Rank, 103 
of a free module, 416 
Rational canonical form, 435 
Rational function, 342, 344, 487 
field of, 344 
R-automorphism, 477 


Real quadratic field, 402-5 
Recursive definition, 517 

of the determinant, 20 
Reducible representation, 295 
Reflection, 134, 160 

glide, 160 
Regular representation, 304-7 
Relations, 212-16, 423-26 

adding, 337 -38 

complete set of, 215 

defining, 212 

module of, 424 

orthogonality, 309-11 
Relation vector, 424 
Relatively prime elements, 362 
Representation 

adjoint, 289 

complex, 293 

conjugate, 293 

faithful, 291 

of a group, 290-92 

induced, 321 

irreducible, 294-96 

isomorphism of, 293, 307 

matrix, 290 

orthogonal, 269 

permutation, 181-83, 304 

reducible, 295 

regular, 304-7 

sign, 291 

standard, 291 

of SU2, 311-14 

trivial, 291 

unitary, 296-98 
Representative element, 55 
Residue, 330, 335 
Resolvent cubic, 496 
Restriction, 110, 181 

crystallographic, 171-72 

of homomorphism, 61 
Riemann Existence Theorem, 465 
Riemann surface, 350, 352, 464 
Right coset, 58-59, 216 
Right inverse, 7 
Right multiplication, 216 
Right shift operator, 109 


Rings, 323-58 
automorphism of, 355 
characteristic of, 334 
extension of, 338 
homomorphism of, 328-34 
ideals in, 328-34, 387-89 
of integers, 384 
linear algebra in, 412-41 
noetherian, 426-29 
noncommutative, 324 


polynomial, 325-28, 339, 432-35, 436 


product, 341-42 
quotient, 334-38 
unit of, 325 
zero, 324, 414 
R-matrix, 414 
determinant of, 414 
R-module, 412 
homomorphism of, 427 
Root 
adjoining, 456-59 
multiple, 458 
Root of unity, 497-500 
Rotation, 134, 160 
axis of, 134 
Rotational symmetry, 154 
Rotation group, 137 
finite subgroups of, 183-87 
$O3, 269-72 
Rotation matrix, 108, 134 
Row echelon matrix, 13-15 
Row index, 1 
Row operation, 10 
elementary, 10 
Rowrank, 108 
Row reduction, 10-17 
Row vector, 2, 97, 108 


S 


Scalar multiplication, 2, 5, 78, 84, 90 
associative law for, 90 

Scalars, 2 

Schur’s lemma, 307-9 

Schwartz inequality, 256 

Second Isomorphism Theorem, 227 
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Self-adjoint matrix, 233 
Self-adjoint operator, 243 
Semigroup, 75 
Sets 
independent, 87, 95, 97, 415 
inductive, 518 
ordered, 86 
product, 527 
Sheets, 465 
Shift operator, 434 
Sieve of Eratosthenes, 372 
Sigma notation, 4 
Signature of a form, 240 
Sign representation, 291 
Simple groups, 199 
Singular operator, 109 
Singular point, 358 
Size function, 360 
Skew-Hermitian matrix, 267 
Skew-symmetric form, 230, 249-52 
Solvable element, 502 
Space 
covering, 351 
Euclidean, 241-42 
Hermitian, 241-42 
Span, 86 
defined, 91 
of infinite set, 97 
Special linear group, 43, 50 
Spectral theorem, 242-45, 253 
for Hermitian operators, 244 
for normal operators, 244 
for symmetric operators, 245 
for unitary matrices, 244-45 
Sphere, 263-66 
celestial, terrestrial, 264 
Spin group, homomorphism, 269 
Split prime, 395 
Splitting field, 483-84 
Splitting Theorem, 484 
Sporadic group, 283 
Square-free integer, 384 
Square matrix, 2, 8 
Square system, 16-17 
Stabilizer, of element, 177-78 
Standard basis, 88, 415 
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Standard representation, 291 
Stereographic projection, 263 
Structure Theorem 
for abelian groups, 429-30 
for modules, 432-35 
uniqueness for, 431-32 
Subfield, 80 
Subgroup, 42 
of additive group of integers, 
43-46 
characteristic, 225 
commutator, 225 
conjugate, 72, 178, 203 
discrete, 168 
finite, 183-87 
normal, 66 
proper, 43 
of SL2, 280-83 
Sylow p-subgroups, 203 
trivial, 43 
zero, 422 
Submodule, 413 
direct sum of, 430 
of free modules, 421-23 
Subring, 323, 324 
Subsets, operation on, 181 
Subspace, 78-80, 85 
independent, 95 


linear transformation and, 102 


nondegenerate on a, 236 

orthogonal space to, 252 

proper, 79 

sum of, 95 
Substitution Principle, 329 
Successor function, 516 
Summation notation, 5, 28 
Surjective map, 54 
Sylow p-subgroups, 203 
Sylow theorems, 195, 203-7 
Sylvester’s law, 240, 256, 258 
Symbolic notation, 55 
Symmetric form, 229, 230 
Symmetric function, 477-81 

elementary, 478 
Symmetric Functions Theorem, 

479-81 


Symmetric group, 24, 41, 50, 197 

conjugation in, 200-203 
Symmetric operator, 245 

spectral theorem for, 245 
Symmetric polynomial, 477 
Symmetry, 154-94 

abstract, 176-78 

bilateral, 154 

glide, 155 

Hermitian, 233 

of plane figures, 154-56 

rotational, 154 

translational, 155 
Symplectic group, 261 
System, 4 

coordinate, 159 

square, 16-17 


T 


Tangent vector field, 280 
Terrestrial sphere, 264 
Tetrahedral group, 183 

Third Isomorphism Theorem, 227 
T-invariant, 110 


Todd-Coxeter Algorithm, 206, 216-20 


Total ordering, 518 
Trace, 116 
Transcendental element, 443-46 
Transformation 
Lorentz, 262 
Tschirnhausen, 482 
Translation, 156, 160 
in a group, 277-80, 286-87 
left, 277 
Translation group, 168-70 
Translation vector, 163 
Translational symmetry, 155 
Transpose, matrix, 17-18 
Transposition, 25 
Triangle group, 226 
Triangular form, 116-19 
Trivial homomorphism, 48 
Trivial representation, 291 
Trivial subgroup, 43 
Truncated polyhedron, 186 


Tschirnhausen transformation, 482 
Two-dimensional crystallographic group, 
172 


U 


Unbranched covering, 351 
Union, 527 
Unipotent, 355 
Unique factorization domain, 364 
Uniqueness of the determinant, 20-21 
Unit, of a ring, 325 
Unitary group, 235, 261 

SU2, 266-69, 284 
Unitary matrix, 235 

spectral theorem for, 244~45 
Unitary representations, 296-98 
Unit ball, 264 
Unit ideal, 331 
Unit vector, 133 
Unity, root of, 497-500 
Upper bound, 518 
Upper triangular matrix, 6 


Vv 


Vandermonde determinant, 511 
Variety, 347 
Vector 
angle between, 242 
column, 2 
coordinate, 78, 90, 416 
fixed, 111 
length of, 242 
nonzero, 113 
null, 236, 252 
orthogonal, 252 


Index 


relation, 424 
tangent, 280 
translation, 163 

; unit, 133 

Vector addition, 78 

Vector bundle, 436 

Vector space, 78-101, 99 
bases and dimension, 86-91 
computing with bases, 91-95 
defined, 84-86 
direct sum, 95-96 
fields, 80-84 
finite-dimensional, 89 
infinite-dimensional, 96-98 
isomorphism of, 85, 91 
subspace, 78-80 


Ww 


Weight, weighted degree, 482 
Well-defined, 180 

Wilson’s theorem, 99 

Word problem, 213 


Z 


Zero 
characteristic, 83, 484 
common, 347 

Zero divisor, 343 

Zero element, 417 

Zero ideal, 331 

Zero map, 328 

Zero matrix, 6 

Zero ring, 324, 414 

Zero vector, 126 

Zorn’s Lemma, 98, 348, 518-19 
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